Skip to content

during tweak quality of overlapped paired of reads, the quality of some bases in the end of overlap region are not tweaked #1195

@wulj2

Description

@wulj2

Buggy results:

reference
>ref
GCATGGCATACAAATTATTTCATTCCCATTGAGAAATAAAATCCAATTCTCCATCACCAAGAGAGCCTTCCGAAAGAGGCCCCCCTGGGCAAACGGCCACCGATGGAGAGGTCTGCCAGTCCTCTTCTACCCCACCCACGCCCCCACCCTAATCAGAGGCCAAACCCTTCCTGGAGCCTGTGATAAAAGCAACTGTTAGCTTGCACTAGACTAGCTTCAAAGTTGTATTGACCCTGGTGTGTTATGTCTAAGAGTAGATGCCATATCTCTTTTCTGG

read pairs
read1
@rq_182_1:0:0_2:0:0_0
CACAGGCTCCAGGAAGGGTTTGGCCTCTGAATAGGGTGGGGGCGTGGGTGGGGTAGAAGAGGACTGGCAGACCTCTCCATCGGTGGCCGTTTGCCCAGGGGGGCCTCTTTCGGAAGGCTCTCTTGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

read2
@rq_182_1:0:0_2:0:0_0
GAAATAAAATCCAATTCTCCAACACCAAGCCGAAAGAGGCCCCCCTGGGCAAACGAAACACATCGGCCACCGATGGAGAGGTCTGCCAGTCCTCTTCTACCCCACCCACGCCCCCACC
+
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

alignment result
rq_182_1:0:0_2:0:0_0	83	ref	56	60	127M	=	33	-150	AACAAGAGAGCCTTCCGAAAGAGGCCCCCCTGGGCAAACGGCCACCGATGGAGAGGTCTGCCAGTCCTCTTCTACCCCACCCACGCCCCCACCCTATTCAGAGGCCAAACCCTTCCTGGAGCCTGTG	2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	NM:i:2	MD:Z:1C94A30	MC:Z:29M8D24M10I55M	AS:i:120	XS:i:0
rq_182_1:0:0_2:0:0_0	163	ref	33	60	29M8D24M10I55M	=	56	150	GAAATAAAATCCAATTCTCCAACACCAAGCCGAAAGAGGCCCCCCTGGGCAAACGAAACACATCGGCCACCGATGGAGAGGTCTGCCAGTCCTCTTCTACCCCACCCACGCCCCCACC	2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222	NM:i:19	MD:Z:21T7^AGAGCCTT79	MC:Z:127M	AS:i:73	XS:i:0

however samtools depth -q 10 -s  generate depth not right at two position
ref	147	2
ref	148	2

these two positions are overlapped of course.

I checked the code in sam.c which tweak the quality of overlapped regions and found the bug as below:
each time a new cigar M or X or = is met, if there is another cigar which is none of "MX=", the cigar will be set to 0 first, and then enter the situation to process "MX=", in this case, it increase the positions (*iseq)++; (*icig)++; (*iref)++(https://github.com/samtools/htslib/blob/develop/sam.c#L4459), thus the first matched base in this kind of "MX=" cigar will never be processed in a right way which leads the variable "iref" in https://github.com/samtools/htslib/blob/develop/sam.c#4510 will not increase to the right amount, so finally the tweak loop in https://github.com/samtools/htslib/blob/develop/sam.c#L4500 will not iterate all the overlapped positions and left some end overlapped bases unprocessed, the number of unprocessed bases equals to number of those kinds of "MX=" cigar which appears after other kinds of cigar in overlapped region.

I have fixed this bug by #1196 , please consider merge it if I have done it properly.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions