Skip to content

Unmapped reads are not added to the BAI linear index #1142

@jmarshall

Description

@jmarshall

Unmapped reads are not added to the BAI linear index, whether they are placed or unplaced:

htslib/hts.c

Lines 1939 to 1950 in 1814ae3

if ( tid>=0 )
{
if (idx->bidx[tid] == 0) idx->bidx[tid] = kh_init(bin);
if (is_mapped) {
// shoehorn [-1,0) (VCF POS=0) into the leftmost bottom-level bin
if (beg < 0) beg = 0;
if (end <= 0) end = 1;
// idx->z.last_off points to the start of the current record
if (insert_to_l(&idx->lidx[tid], beg, end,
idx->z.last_off, idx->min_shift) < 0) return -1;
}
}

Since PR #1031, which landed in HTSlib 1.11 and made iterators directly take advantage of the linear index, iterator behaviour has changed for some data files. For example, if the first read in a query region is a placed unmapped read (i.e., it's filled in with its mate's coordinate) and that read is earlier in the file than the first mapped read in the same linear interval — then using the BAI linear index will cause that first read to be omitted from the iteration output as it skips over it to the linear interval's first mapped read.

Unmapped reads stopped being added to the linear index in samtools/samtools@cb5f449 for not entirely clear reasons. It may be that this was a bug fix to stop invalid data being added to the linear index in the case of unplaced reads (with tid set, slightly oddly, and a pos of -1), but that placed unmapped reads should have continued to be added.

Originally posted by @jmarshall in #1031 (comment) — see further discussion there, including note of the resulting pysam test case failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions