Skip to content

Server changes: load all graphs, don't compute num_relations in stats#563

Merged
karasikov merged 19 commits intomasterfrom
mk/server
Nov 20, 2025
Merged

Server changes: load all graphs, don't compute num_relations in stats#563
karasikov merged 19 commits intomasterfrom
mk/server

Conversation

@karasikov
Copy link
Member

@karasikov karasikov commented Nov 6, 2025

Improvements in server

  • Computing num_relations takes up to 1 min for very large graphs that are loaded with mmap and not cached.
    Now we don't report it, hence /stats always works instantly.

  • Always keep all graphs loaded, so that they don't need to be reloaded for every query.

  • Don't store copied annotation labels, which may create a large overhead for indexes with many labels

  • Fixed catching exceptions thrown in worker threads, so now it reliably works with incorrect requests in multi-graph regime.

  • Logging and other minor improvements

Copy link
Contributor

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think overall it looks nice and is improvement, but I don't really like using .first and .second a lot, and I think it's generally a good practice to use named structures, rather than std::pair / std::tuple, except maybe in function return values that are immediately parsed by structured bindings.

I would prefer to change .first and .second into structured bindings that would give them names, or in this particular case, I also think we can "painlessly" integrate the new graphs_cache map into existing indexes map, as we almost exclusively access it using all stored values in indexes[name] anyway.

size_t num_server_threads = std::max(1u, get_num_threads());
set_num_threads(0);

std::unordered_map<std::pair<std::string, std::string>, std::unique_ptr<AnnotatedDBG>> graphs_cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like an extra map could be unnecessary here, given that all actual use cases are iterating over indexes[name] and accessing them in that order. Maybe make

struct index {
    std::string graph_fname;
    std::string anno_fname;
    std::unique_ptr<AnnotatedDBG> anno_dbg;
};

And put it in indexes instead of std::pair<std::string, std::string>? In any case, I think struct with named fields is better than std::pair / std::tuple, outside of some specific cases.

Copy link
Member Author

@karasikov karasikov Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of two maps is to allow having the same (graph, anno) for multiple different names without duplicating it in memory. It's a weird case, but it happens in tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, alright, that's annoying to deal with. And making a custom hasher for in-place struct is a pain too. Let's just give names to .first and .second wherever we use them then?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the server performance by preloading and caching all graphs to avoid repeated loading, removes the expensive num_relations computation from the /stats endpoint, and improves exception handling in multi-threaded worker contexts. It also includes various logging improvements with consistent [Server] prefixes and better error visibility.

Key Changes:

  • All graphs are now preloaded into a cache (graphs_cache) and kept in memory to eliminate reload overhead for queries
  • Worker thread exceptions are now properly captured and rethrown via std::exception_ptr
  • The /stats endpoint no longer computes num_relations, making it return instantly even for large mmap'd graphs

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
metagraph/src/cli/server_utils.cpp Changed exception types to std::invalid_argument, improved error logging with warn level and request IDs
metagraph/src/cli/server.cpp Implemented graph caching with preloading, fixed worker thread exception handling, updated stats endpoint to use cache, reduced request timeout to 15 minutes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@karasikov karasikov merged commit 48a1af9 into master Nov 20, 2025
147 of 148 checks passed
@karasikov karasikov deleted the mk/server branch November 20, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants