Conversation
adamant-pwn
left a comment
There was a problem hiding this comment.
Thanks! I think overall it looks nice and is improvement, but I don't really like using .first and .second a lot, and I think it's generally a good practice to use named structures, rather than std::pair / std::tuple, except maybe in function return values that are immediately parsed by structured bindings.
I would prefer to change .first and .second into structured bindings that would give them names, or in this particular case, I also think we can "painlessly" integrate the new graphs_cache map into existing indexes map, as we almost exclusively access it using all stored values in indexes[name] anyway.
| size_t num_server_threads = std::max(1u, get_num_threads()); | ||
| set_num_threads(0); | ||
|
|
||
| std::unordered_map<std::pair<std::string, std::string>, std::unique_ptr<AnnotatedDBG>> graphs_cache; |
There was a problem hiding this comment.
I feel like an extra map could be unnecessary here, given that all actual use cases are iterating over indexes[name] and accessing them in that order. Maybe make
struct index {
std::string graph_fname;
std::string anno_fname;
std::unique_ptr<AnnotatedDBG> anno_dbg;
};And put it in indexes instead of std::pair<std::string, std::string>? In any case, I think struct with named fields is better than std::pair / std::tuple, outside of some specific cases.
There was a problem hiding this comment.
The purpose of two maps is to allow having the same (graph, anno) for multiple different names without duplicating it in memory. It's a weird case, but it happens in tests
There was a problem hiding this comment.
Ah, alright, that's annoying to deal with. And making a custom hasher for in-place struct is a pain too. Let's just give names to .first and .second wherever we use them then?
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the server performance by preloading and caching all graphs to avoid repeated loading, removes the expensive num_relations computation from the /stats endpoint, and improves exception handling in multi-threaded worker contexts. It also includes various logging improvements with consistent [Server] prefixes and better error visibility.
Key Changes:
- All graphs are now preloaded into a cache (
graphs_cache) and kept in memory to eliminate reload overhead for queries - Worker thread exceptions are now properly captured and rethrown via
std::exception_ptr - The
/statsendpoint no longer computesnum_relations, making it return instantly even for large mmap'd graphs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| metagraph/src/cli/server_utils.cpp | Changed exception types to std::invalid_argument, improved error logging with warn level and request IDs |
| metagraph/src/cli/server.cpp | Implemented graph caching with preloading, fixed worker thread exception handling, updated stats endpoint to use cache, reduced request timeout to 15 minutes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Improvements in server
Computing num_relations takes up to 1 min for very large graphs that are loaded with mmap and not cached.
Now we don't report it, hence
/statsalways works instantly.Always keep all graphs loaded, so that they don't need to be reloaded for every query.
Don't store copied annotation labels, which may create a large overhead for indexes with many labels
Fixed catching exceptions thrown in worker threads, so now it reliably works with incorrect requests in multi-graph regime.
Logging and other minor improvements