I got the following values in samtools stats output from samtools 1.11, which is the latest release at this time. (The full output is attached as samtools-stats.txt)
It seems like the numbers for inward oriented pairs and outward oriented pairs would imply more reads than raw total sequences.
2 * (inward oriented pairs + outward oriented pairs) = 2 * (124586692 + 3055596) = 255284576, which is greater than raw total sequences 244454886. (In fact, just the reads implied by inward oriented pairs is enough to exceed the total number of sequences.)
I wonder if this could be indicative of a bug or is there something that I am overlooking?
(I tried to look at the code. By a quick glance, it seems like here, for each read, one of the counters for other/inward/outward orientation is increased and here the counts are halved. It does not seem clear to me why the number of (inward + outward + other) pairs multiplied by two could exceed the number of reads in the file.)
# This file was produced by samtools stats (1.11+htslib-1.11) and can be plotted using plot-bamstats
# This file contains statistics for all reads.
# The command line was: stats /XXX/XXX/XXX.aligned.duplicates_marked.recalibrated.bam
# CHK, Checksum [2]Read Names [3]Sequences [4]Qualities
# CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)
CHK d9485ac9 0ae529f3 a1fc277b
# Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN raw total sequences: 244454886
SN filtered sequences: 0
SN sequences: 244454886
SN is sorted: 1
SN 1st fragments: 122227443
SN last fragments: 122227443
SN reads mapped: 237282395
SN reads mapped and paired: 237117316 # paired-end technology bit set + both mates mapped
SN reads unmapped: 7172491
SN reads properly paired: 237117316 # proper-pair bit set
SN reads paired: 244454886 # paired-end technology bit set
SN reads duplicated: 40960232 # PCR or optical duplicate bit set
SN reads MQ0: 1095026 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 0
SN total length: 23762404584 # ignores clipping
SN total first fragment length: 11906044837 # ignores clipping
SN total last fragment length: 11856359747 # ignores clipping
SN bases mapped: 23092079754 # ignores clipping
SN bases mapped (cigar): 22945895850 # more accurate
SN bases trimmed: 0
SN bases duplicated: 3977651283
SN mismatches: 0 # from NM fields
SN error rate: 0.000000e+00 # mismatches / bases mapped (cigar)
SN average length: 97
SN average first fragment length: 97
SN average last fragment length: 97
SN maximum length: 98
SN maximum first fragment length: 98
SN maximum last fragment length: 98
SN average quality: 34.4
SN insert size average: 716.6
SN insert size standard deviation: 1639.9
SN inward oriented pairs: 124586692
SN outward oriented pairs: 3055596
SN pairs with other orientation: 0
SN pairs on different chromosomes: 0
SN percentage of properly paired reads (%): 97.0
I got the following values in
samtools statsoutput from samtools 1.11, which is the latest release at this time. (The full output is attached as samtools-stats.txt)It seems like the numbers for inward oriented pairs and outward oriented pairs would imply more reads than raw total sequences.
2 * (inward oriented pairs + outward oriented pairs) = 2 * (124586692 + 3055596) = 255284576, which is greater than raw total sequences 244454886. (In fact, just the reads implied by inward oriented pairs is enough to exceed the total number of sequences.)
I wonder if this could be indicative of a bug or is there something that I am overlooking?
(I tried to look at the code. By a quick glance, it seems like here, for each read, one of the counters for other/inward/outward orientation is increased and here the counts are halved. It does not seem clear to me why the number of (inward + outward + other) pairs multiplied by two could exceed the number of reads in the file.)