API support for count/coord indexes, parallel graph search, and intermediate query representation#358
API support for count/coord indexes, parallel graph search, and intermediate query representation#358
Conversation
|
I believe these failed checks are related to AppleClang 13 and not the changes, this is the first commit that has been checked with 13 (previous ones are using 12_4). On MacOS Monterey with AppleClang 13 even master branch fails to build. I built these with g++ 11 instead. |
Feel free to adapt main.yml |
|
I've left issue #359 and outlined a way to squash this in the workflow. It depends on whether or not you want to support AppleClang 13 now or point any users to another compiler for now |
karasikov
left a comment
There was a problem hiding this comment.
Please apply similar stylistic changes in the rest of the PR.
I fixed that. Please merge master into this branch to pull the fix. |
…into tz/count_api
karasikov
left a comment
There was a problem hiding this comment.
It still shows changes that have been merged to master. So this means the commits from master haven't been merged into this branch. Can you do that?
git fetch
git checkout tz/count_api
git merge origin/master
metagraph/src/cli/query.cpp
Outdated
| for (const auto &coords : tuples) { | ||
| if (coords.empty()) continue; | ||
|
|
||
| // TODO: Look for example where row_tuples are multiple values? |
There was a problem hiding this comment.
oh, never mind. I see what you mean...
There was a problem hiding this comment.
@hmusta What format would you suggest to print them succinctly?
(Suppose we have a single sequence with repeats and each k-mer may have multiple coordinates.)
Maybe <position in query>-<start_coord>-<end_coord>:..., or you know some other tools that output a similar thing?
There was a problem hiding this comment.
You can think of the case where the indexed sequence is AAA...A and the query is AAA.
So essentially every k-mer gets all the coordinates.
The current implementation will only print one coordinate range, which is incorrect. So this should be fixed. But we should agree on the desired output format first.
- Made coord/count flags client options instead of default - Error thrown if coord/count flags tried with unsupported queries - Empty alignment attempts default to empty result for now (#369) - Integration tests for coord/count API queries
| executor = ThreadPoolExecutor(max_workers=num_processes) | ||
|
|
||
| # Populate async results dict with concurrent.futures.Future instances | ||
| for name, graph_client in self.graphs.items(): | ||
| # TODO: do this async | ||
| result[name] = graph_client.align(sequence, min_exact_match, | ||
| max_alternative_alignments, | ||
| max_num_nodes_per_seq_char) | ||
| futures[name] = executor.submit(graph_client.align, sequence, min_exact_match, | ||
| max_alternative_alignments, | ||
| max_num_nodes_per_seq_char) | ||
|
|
||
| return result | ||
| print(f'Made {len(self.graphs)} requests with {num_processes} threads...') | ||
|
|
||
| # Shutdown executor but do not stop futures | ||
| executor.shutdown(wait=False) | ||
| return futures |
There was a problem hiding this comment.
Do we really benefit from returning Futures instead of just always waiting for everything to finish and returning the actual dictionary?
New features:
--count-kmersand--query-coordsin API and server.--verbose-coordsquery flag.MultiGraphClientBackend has been re-written to use an intermediate query representation instead always returning string output (and re-parsing string into JSON for server). From here, can also allow for more flexible server JSON responses.