In the development of Antonie, I generated BAM files that were subtly malformed. This is of course my own bug, and of course samtools should not compensate for my bugs. However, samtools silently accepted my BAM files, and appeared to process them quite well!
Further investigation found that 'samread()' returns negative values for both errors and EOFS, and that many loops within samtools treat all negative values equally. In other words, they turn an error into a normal EOF, which generates no error or warning message.
Any loop like this is problematic:
while (samread(sam,bam_line) >= 0) {... }
As an example, http://ds9a.nl/tmp/blah.bam has an invalid sequence id in there, but 'samtools stats blah.bam' processes it without apparent error, but also without producing any statistics beyond the problematic read.
While I of course appreciate the samtools software, I would suggest screaming bloody murder on any kind of unexpected error, lest our users end up with invalid results because part of their data was silently skipped!
Thanks for your attention.
In the development of Antonie, I generated BAM files that were subtly malformed. This is of course my own bug, and of course samtools should not compensate for my bugs. However, samtools silently accepted my BAM files, and appeared to process them quite well!
Further investigation found that 'samread()' returns negative values for both errors and EOFS, and that many loops within samtools treat all negative values equally. In other words, they turn an error into a normal EOF, which generates no error or warning message.
Any loop like this is problematic:
while (samread(sam,bam_line) >= 0) {... }
As an example, http://ds9a.nl/tmp/blah.bam has an invalid sequence id in there, but 'samtools stats blah.bam' processes it without apparent error, but also without producing any statistics beyond the problematic read.
While I of course appreciate the samtools software, I would suggest screaming bloody murder on any kind of unexpected error, lest our users end up with invalid results because part of their data was silently skipped!
Thanks for your attention.