Adjust the min offset of a bin when unmapped placed reads detected#1158
Adjust the min offset of a bin when unmapped placed reads detected#1158jkbonfield merged 1 commit intosamtools:developfrom
Conversation
|
I haven't looked at the code yet, but I can confirm this fixes things in my tests and it retains the speed and I/O reduction from the 1.11. On my shallow 4.5x coverage set it basically gave identical results (exactly the same level of I/O) as 1.11. On the synthetic test set with unmapped + mapped pairs (in that order), it does different I/O because it's fixing the indexing problem, but was very close to 1.11 still (slightly behind without -M, slightly ahead with -M) and much more performant than 1.10. Thumbs up from the evaluation side. |
hts.c
Outdated
| && rel_off < idx->lidx[tid].n | ||
| && min_off < idx->lidx[tid].offset[rel_off]) | ||
| min_off = idx->lidx[tid].offset[rel_off]; | ||
| if (unmapped) { |
There was a problem hiding this comment.
I think this should be contained within the if statement above, and the line above should be an else clause - mainly because it uses idx->lidx[tid].offset and I think that doesn't exist in CSI?
There was a problem hiding this comment.
Agree with the nesting of if(unmapped) inside if (idx->lidx[tid].offset && rel_off < idx->lidx[tid].n).
I am not sure about the else, though. I need min_off = idx->lidx[tid].offset[rel_off] to be the staring point for searching a new value.
There was a problem hiding this comment.
Ah, yes. I see it does use it as a starting point - in which case the loop should probably start at rel_off - 1.
7ad7d70 to
49059d3
Compare
|
The pysam test case will soon no longer exercise this behaviour (pysam-developers/pysam#962), as that is behaviour outwith pysam so it doesn't want to be affected by it. Does HTSlib need test cases added to ensure this doesn't regress again in future? |
Unmapped placed reads can be skipped by an iterator read, when they fall at the beginning of a BGZF block, because they are not added to the linear index. However, the meta-bin from the bin index records the presence of unmapped reads and can indicate a potential change of the
min_offvalue.The logic for adjusting the
min_offis to look backwards in the linear index until a smaller offset value is found. This new value is then compared to the smallest starting offset of a chunk inside the respective bin. The largest of the two values is then chosen as the newmin_off.Replaces #1155
Fixes #1142