Skip to content

Pileup filter-threshold flag not being correctly applied in v0.6.0 and up #564

@mbosm

Description

@mbosm

Hello. Apologies if this has been mentioned, but I couldn't find it in a search. It seems that the pileup --filter-threshold global tag isn't working correctly.

Behavior in older versions:
--filter-threshold 0.9
--filter-threshold A:0.8
--mod-threshold m:0.95
Will set the threshold for both unmodified and modified bases to the value set (90% in this example). Specific instances will override this. So in this example,
A: 0.8
T: 0.9
C: 0.9
G: 0.9
m6A: 0.9
m5C: 0.95

In v0.6.0 (and the most recent patch I could find, v0.6.1_4d52624), this "default" filter-threshold (correctly) prevents the program for determining its own filter-threshold, but also doesn't apply the set value. It seems it leaves it at 0.0, because it doesn't filter anything. So:
A: 0.8
T: 0.0
C: 0.0
G: 0.0
m6A: 0.0
m5C: 0.95

Here's my data. I have a modbam file with a C/m5C at position 4.
Using 'modkit extract calls --region [region]:4-5 [modbam] output.tsv', I can see this data:

  • There are 159 total reads that contain position 4.
  • For unmodified C, there are 157, 154 of which are >= 0.7 call_prob
  • For m5C (m), there is 1 read, 1 of which are >= 0.7 call_prob
  • For Cm (19228), there is 1 read, 0 of which are >= 0.7 call_prob

I can post the full commands if it helps, but essentially:

  • If I use v0.5.1rc1, the previous version before v0.6.0, and run modkit pileup with --filter-threshold 0.7 and no other threshold settings, it removes 4 reads (the three unmodified and one Cm that are less than 0.7 call_prob). count_fail is 4, valid_coverage is 155.
  • If I use the newest version of the software and run modkit pileup with --filter-threshold 0.7 and no other threshold settings, it removes 0 reads. count_fail is 0, valid_coverage is 159.

Given that there are likely to be way more low threshold modification calls than low non-modification calls, this will artificially inflate modification calling percentages while being invisible to anyone who doesn't carefully analyze the results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbuild-availablecustom build produced for fix.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions