Unmapped reads are not added to the BAI linear index, whether they are placed or unplaced:
|
if ( tid>=0 ) |
|
{ |
|
if (idx->bidx[tid] == 0) idx->bidx[tid] = kh_init(bin); |
|
if (is_mapped) { |
|
// shoehorn [-1,0) (VCF POS=0) into the leftmost bottom-level bin |
|
if (beg < 0) beg = 0; |
|
if (end <= 0) end = 1; |
|
// idx->z.last_off points to the start of the current record |
|
if (insert_to_l(&idx->lidx[tid], beg, end, |
|
idx->z.last_off, idx->min_shift) < 0) return -1; |
|
} |
|
} |
Since PR #1031, which landed in HTSlib 1.11 and made iterators directly take advantage of the linear index, iterator behaviour has changed for some data files. For example, if the first read in a query region is a placed unmapped read (i.e., it's filled in with its mate's coordinate) and that read is earlier in the file than the first mapped read in the same linear interval — then using the BAI linear index will cause that first read to be omitted from the iteration output as it skips over it to the linear interval's first mapped read.
Unmapped reads stopped being added to the linear index in samtools/samtools@cb5f449 for not entirely clear reasons. It may be that this was a bug fix to stop invalid data being added to the linear index in the case of unplaced reads (with tid set, slightly oddly, and a pos of -1), but that placed unmapped reads should have continued to be added.
Originally posted by @jmarshall in #1031 (comment) — see further discussion there, including note of the resulting pysam test case failures.
Unmapped reads are not added to the BAI linear index, whether they are placed or unplaced:
htslib/hts.c
Lines 1939 to 1950 in 1814ae3
Since PR #1031, which landed in HTSlib 1.11 and made iterators directly take advantage of the linear index, iterator behaviour has changed for some data files. For example, if the first read in a query region is a placed unmapped read (i.e., it's filled in with its mate's coordinate) and that read is earlier in the file than the first mapped read in the same linear interval — then using the BAI linear index will cause that first read to be omitted from the iteration output as it skips over it to the linear interval's first mapped read.
Unmapped reads stopped being added to the linear index in samtools/samtools@cb5f449 for not entirely clear reasons. It may be that this was a bug fix to stop invalid data being added to the linear index in the case of unplaced reads (with
tidset, slightly oddly, and aposof -1), but that placed unmapped reads should have continued to be added.Originally posted by @jmarshall in #1031 (comment) — see further discussion there, including note of the resulting pysam test case failures.