Prevent corruption of the ID value of a PG line#1256
Prevent corruption of the ID value of a PG line#1256daviesrob merged 1 commit intosamtools:developfrom
Conversation
|
This comes under the category of “heuristic to improve matters in this If this change fixes the I was rather expecting the fix to this to involve changing the |
Indeed. It is a "most common case" heuristic and I've added a warning message to inform the user. The spec does say:
The issue was entirely in the header API, due to the different ways
The |
IMHO that spec sentence is intended to mean that the IDs of different BWA might like to think that the second ID tag is subordinate to the CL field, but from HTSlib's point of view they're all just tab-separated fields on a level playing field. Certainly that's how HTSJDK is treating this. Fortunately HTSlib's header API can be improved to work consistently off the first ID field on lines that have more than one, without expressing an opinion on whether such lines are wise or whether there's anything special about the second ID tag when it comes after a CL field.
Okay, thanks — that makes this make sense. So This BTW is the sort of information that it is very useful to capture in commit messages! 😄 (There is still a lack of error handling in samtools bam_sort.c: in the “can't happen” case of |
…header and issue a warning when multiple ID tags are encountered on the same line. This change also enforces consistency across the header API methods, which use the ID tag as an argument, by making them agree to always return the first encountered ID value.
|
It looks like this does the trick. It's fairly chatty when running on the test case I made for samtools/samtools#1393, so I expect people will notice multiple |
Some applications add supplementary ID tags to a header
@PGline. This change prevents the corruption of the proper (first) ID value of a@PGline, when other unrelated ID tags are present.Fixes samtools/samtools#1393