Removal the bgzf_idx_flush assertion.#1168
Conversation
|
PS. Don't ask how long I spent scratching my head over why indices were different when I did They carried on differing until I spotted the BAMs also differed, and then remembered to add |
dbde0e7 to
b7f8d08
Compare
bgzf.c
Outdated
| assert(mt->idx_cache.nentries == 0 || mt->block_written >= e[0].block_number); | ||
|
|
||
| for (i = 0; i < mt->idx_cache.nentries && e[i].block_number == mt->block_written; i++) { | ||
| for (i = 0; i < mt->idx_cache.nentries && e[i].block_number <= mt->block_written; i++) { |
There was a problem hiding this comment.
The value (mt->block_address << 16) + e[i].offset is only valid when e[i].block_number == mt->block_written otherwise an incorrect index entry would be written. Which suggests that using == in the for loop was correct, and the assertion should have been to check that blocks haven't been written without adding their index entries, i.e.:
assert(mt->idx_cache.nentries == 0 || mt->block_written <= e[0].block_number)
I don't think this can happen, but then isn't that the sort of reasoning that the assertion was trying to check?
The assumption is that we call bgzf_idx_flush once per outgoing multi-threaded block, and the block numbers will match. The assertion is to validate we're not indexing out of order. However, very long records (eg with huge aux tags or long seqs) can mean a single record spans multiple blocks and we inherently then skip blocks in bgzf_idx_push calls. Hence the assertion is simply wrong. Fixes samtools/samtools#1328
b7f8d08 to
affdeea
Compare
The assumption is that we call bgzf_idx_flush once per outgoing multi-threaded block, and the block numbers will match. The assertion is to validate we're not indexing out of order.
However, very long records (eg with huge aux tags or long seqs) can mean a single record spans multiple blocks and we inherently then skip blocks in bgzf_idx_push calls. Hence the assertion is simply wrong.
Fixes samtools/samtools#1328