Skip to content

Samtools stats 'inward oriented pairs' and 'outward oriented pairs' seem to imply more reads than 'raw total sequences'? #1360

@jkmatila

Description

@jkmatila

I got the following values in samtools stats output from samtools 1.11, which is the latest release at this time. (The full output is attached as samtools-stats.txt)

It seems like the numbers for inward oriented pairs and outward oriented pairs would imply more reads than raw total sequences.

2 * (inward oriented pairs + outward oriented pairs) = 2 * (124586692 + 3055596) = 255284576, which is greater than raw total sequences 244454886. (In fact, just the reads implied by inward oriented pairs is enough to exceed the total number of sequences.)

I wonder if this could be indicative of a bug or is there something that I am overlooking?

(I tried to look at the code. By a quick glance, it seems like here, for each read, one of the counters for other/inward/outward orientation is increased and here the counts are halved. It does not seem clear to me why the number of (inward + outward + other) pairs multiplied by two could exceed the number of reads in the file.)

# This file was produced by samtools stats (1.11+htslib-1.11) and can be plotted using plot-bamstats
# This file contains statistics for all reads.
# The command line was:  stats /XXX/XXX/XXX.aligned.duplicates_marked.recalibrated.bam
# CHK, Checksum [2]Read Names   [3]Sequences    [4]Qualities
# CHK, CRC32 of reads which passed filtering followed by addition (32bit overflow)
CHK     d9485ac9        0ae529f3        a1fc277b
# Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN      raw total sequences:    244454886
SN      filtered sequences:     0
SN      sequences:      244454886
SN      is sorted:      1
SN      1st fragments:  122227443
SN      last fragments: 122227443
SN      reads mapped:   237282395
SN      reads mapped and paired:        237117316       # paired-end technology bit set + both mates mapped
SN      reads unmapped: 7172491
SN      reads properly paired:  237117316       # proper-pair bit set
SN      reads paired:   244454886       # paired-end technology bit set
SN      reads duplicated:       40960232        # PCR or optical duplicate bit set
SN      reads MQ0:      1095026 # mapped and MQ=0
SN      reads QC failed:        0
SN      non-primary alignments: 0
SN      total length:   23762404584     # ignores clipping
SN      total first fragment length:    11906044837     # ignores clipping
SN      total last fragment length:     11856359747     # ignores clipping
SN      bases mapped:   23092079754     # ignores clipping
SN      bases mapped (cigar):   22945895850     # more accurate
SN      bases trimmed:  0
SN      bases duplicated:       3977651283
SN      mismatches:     0       # from NM fields
SN      error rate:     0.000000e+00    # mismatches / bases mapped (cigar)
SN      average length: 97
SN      average first fragment length:  97
SN      average last fragment length:   97
SN      maximum length: 98
SN      maximum first fragment length:  98
SN      maximum last fragment length:   98
SN      average quality:        34.4
SN      insert size average:    716.6
SN      insert size standard deviation: 1639.9
SN      inward oriented pairs:  124586692
SN      outward oriented pairs: 3055596
SN      pairs with other orientation:   0
SN      pairs on different chromosomes: 0
SN      percentage of properly paired reads (%):        97.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions