Skip to content

API support for count/coord indexes, parallel graph search, and intermediate query representation#358

Merged
karasikov merged 30 commits intomasterfrom
tz/count_api
Jan 9, 2022
Merged

API support for count/coord indexes, parallel graph search, and intermediate query representation#358
karasikov merged 30 commits intomasterfrom
tz/count_api

Conversation

@thomastzhou
Copy link
Collaborator

@thomastzhou thomastzhou commented Nov 29, 2021

New features:

  • Support for --count-kmers and --query-coords in API and server.
  • New default coordinate behaviour: continuous coordinate sequences are collapsed into a range. All coordinates can be shown using --verbose-coords query flag.
  • API level parallel requests with MultiGraphClient

Backend has been re-written to use an intermediate query representation instead always returning string output (and re-parsing string into JSON for server). From here, can also allow for more flexible server JSON responses.

@thomastzhou
Copy link
Collaborator Author

I believe these failed checks are related to AppleClang 13 and not the changes, this is the first commit that has been checked with 13 (previous ones are using 12_4).

On MacOS Monterey with AppleClang 13 even master branch fails to build. I built these with g++ 11 instead.

@karasikov
Copy link
Member

I believe these failed checks are related to AppleClang 13 and not the changes, this is the first commit that has been checked with 13 (previous ones are using 12_4).

On MacOS Monterey with AppleClang 13 even master branch fails to build. I built these with g++ 11 instead.

Feel free to adapt main.yml

@thomastzhou
Copy link
Collaborator Author

thomastzhou commented Nov 30, 2021

I've left issue #359 and outlined a way to squash this in the workflow. It depends on whether or not you want to support AppleClang 13 now or point any users to another compiler for now

@thomastzhou thomastzhou changed the title Tz/count api API support for count/coord indexes and intermediate query representation Nov 30, 2021
Copy link
Member

@karasikov karasikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply similar stylistic changes in the rest of the PR.

@karasikov
Copy link
Member

I've left issue #359 and outlined a way to squash this in the workflow. It depends on whether or not you want to support AppleClang 13 now or point any users to another compiler for now

I fixed that. Please merge master into this branch to pull the fix.

Copy link
Member

@karasikov karasikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still shows changes that have been merged to master. So this means the commits from master haven't been merged into this branch. Can you do that?

git fetch
git checkout tz/count_api
git merge origin/master

for (const auto &coords : tuples) {
if (coords.empty()) continue;

// TODO: Look for example where row_tuples are multiple values?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, never mind. I see what you mean...

Copy link
Member

@karasikov karasikov Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmusta What format would you suggest to print them succinctly?
(Suppose we have a single sequence with repeats and each k-mer may have multiple coordinates.)
Maybe <position in query>-<start_coord>-<end_coord>:..., or you know some other tools that output a similar thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can think of the case where the indexed sequence is AAA...A and the query is AAA.
So essentially every k-mer gets all the coordinates.

The current implementation will only print one coordinate range, which is incorrect. So this should be fixed. But we should agree on the desired output format first.

@thomastzhou thomastzhou changed the title API support for count/coord indexes and intermediate query representation API support for count/coord indexes, parallel graph search, and intermediate query representation Dec 21, 2021
Comment on lines +303 to +315
executor = ThreadPoolExecutor(max_workers=num_processes)

# Populate async results dict with concurrent.futures.Future instances
for name, graph_client in self.graphs.items():
# TODO: do this async
result[name] = graph_client.align(sequence, min_exact_match,
max_alternative_alignments,
max_num_nodes_per_seq_char)
futures[name] = executor.submit(graph_client.align, sequence, min_exact_match,
max_alternative_alignments,
max_num_nodes_per_seq_char)

return result
print(f'Made {len(self.graphs)} requests with {num_processes} threads...')

# Shutdown executor but do not stop futures
executor.shutdown(wait=False)
return futures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really benefit from returning Futures instead of just always waiting for everything to finish and returning the actual dictionary?

@karasikov karasikov merged commit a74911c into master Jan 9, 2022
@karasikov karasikov deleted the tz/count_api branch January 9, 2022 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants