Skip to content

Support a coordinate-to-accession map to a graph, mapping an annotation label+coordinate to an accession#549

Merged
karasikov merged 68 commits intomasterfrom
hm/seq_id_map
Jan 9, 2026
Merged

Support a coordinate-to-accession map to a graph, mapping an annotation label+coordinate to an accession#549
karasikov merged 68 commits intomasterfrom
hm/seq_id_map

Conversation

@hmusta
Copy link
Collaborator

@hmusta hmusta commented Oct 8, 2025

An extension to automatically transform query results from column-based to accession/header-based labels

Usage:

1. Construction

After constructing a coordinate-aware annotation, run another annotation step with flags --anno-filename --index-header-coords to index the headers and construct the CoordToHeader mapping:

metagraph annotate --anno-filename --index-header-coords -i graph.dbg -o annotation file_1.fa file_2.fa ...

2. Query

When the CoordToHeader (.seqs) index is constructed (e.g., annotation.row_diff_brwt_coord.annodbg + annotation.seqs), the mapping is done automatically for the usual query command, e.g.,

metagraph query -i graph.dbg -o annotation.row_diff_brwt_coord.annodbg --query-mode matches query.fa

To disable the mapping and report the original file-based coordinates/counts/labels, pass flag --no-coord-mapping.

@karasikov
Copy link
Member

It would be good to have some Usage commands, how to run this.
(If not in the docs, at least in the PR description above.)

@karasikov
Copy link
Member

Stats are printed in this format:

[2026-01-02 18:40:56.487] [info] Statistics for coord-to-accession mapping 'refseq.seqs'
=============== COORD-TO-ACCESSION STATS ===============
columns: 85375
total sequences: 32881371
total k-mers: 1703657822378
=================== PER-COLUMN STATS ===================
column 0:
  sequences: 215264 (NZ_LPZT01000038.1  NZ_LNTR01000002.1       NZ_LOHN01000001.1       ...)
  k-mers: 5816602042
  k-mers per sequence: 27020.8
column 1:
  sequences: 493377 (NZ_LXGF01000074.1  NZ_LXGF01000102.1       NZ_LXGF01000110.1       ...)
  k-mers: 20332752987
  k-mers per sequence: 41211.4
column 2:
  sequences: 53868 (NZ_UCIF01000001.1   NZ_UCIF01000002.1       NZ_UCIF01000003.1       ...)
  k-mers: 4857890014
  k-mers per sequence: 90181.4
column 3:
  sequences: 6480 (NG_042103.1  NC_027945.1     NW_004948952.1  ...)
  k-mers: 3942536088
  k-mers per sequence: 608416.1
column 4:
  sequences: 15416 (NW_020939779.1      NW_020939780.1  NW_020939781.1  ...)
  k-mers: 3486134738
  k-mers per sequence: 226137.4
column 5:
  sequences: 10230 (NT_107461.3 NT_107462.3     NT_107463.4     ...)
  k-mers: 10818493389
  k-mers per sequence: 1057526.2
...

Copy link
Contributor

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Some comments so far.

@karasikov karasikov requested a review from adamant-pwn January 7, 2026 21:56
Copy link
Contributor

@adamant-pwn adamant-pwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except one misaligned (?) indentation (see comment).

@karasikov karasikov requested a review from adamant-pwn January 9, 2026 03:27
@karasikov karasikov merged commit ed6311c into master Jan 9, 2026
48 checks passed
@karasikov karasikov deleted the hm/seq_id_map branch January 9, 2026 22:53
karasikov added a commit that referenced this pull request Jan 10, 2026
…inate to an accession (#549)

* use k-mer coordinates to decode accession IDs when labels represent chunks/files

* print warning if no CoordToHeader mapping (.seqs file) was found for coordinate annotations

* print num accessions in server stats when the mapping is active

* added coord-to-header tests

* hide query mode `labels` (labels without label counts) by default; shown with flag --advanced

---------

Co-authored-by: Harun Mustafa <hmusta@users.noreply.github.com>
Co-authored-by: Mikhail Karasikov <karasikov@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants