-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hello. Apologies if this has been mentioned, but I couldn't find it in a search. It seems that the pileup --filter-threshold global tag isn't working correctly.
Behavior in older versions:
--filter-threshold 0.9
--filter-threshold A:0.8
--mod-threshold m:0.95
Will set the threshold for both unmodified and modified bases to the value set (90% in this example). Specific instances will override this. So in this example,
A: 0.8
T: 0.9
C: 0.9
G: 0.9
m6A: 0.9
m5C: 0.95
In v0.6.0 (and the most recent patch I could find, v0.6.1_4d52624), this "default" filter-threshold (correctly) prevents the program for determining its own filter-threshold, but also doesn't apply the set value. It seems it leaves it at 0.0, because it doesn't filter anything. So:
A: 0.8
T: 0.0
C: 0.0
G: 0.0
m6A: 0.0
m5C: 0.95
Here's my data. I have a modbam file with a C/m5C at position 4.
Using 'modkit extract calls --region [region]:4-5 [modbam] output.tsv', I can see this data:
- There are 159 total reads that contain position 4.
- For unmodified C, there are 157, 154 of which are >= 0.7 call_prob
- For m5C (m), there is 1 read, 1 of which are >= 0.7 call_prob
- For Cm (19228), there is 1 read, 0 of which are >= 0.7 call_prob
I can post the full commands if it helps, but essentially:
- If I use v0.5.1rc1, the previous version before v0.6.0, and run modkit pileup with --filter-threshold 0.7 and no other threshold settings, it removes 4 reads (the three unmodified and one Cm that are less than 0.7 call_prob). count_fail is 4, valid_coverage is 155.
- If I use the newest version of the software and run modkit pileup with --filter-threshold 0.7 and no other threshold settings, it removes 0 reads. count_fail is 0, valid_coverage is 159.
Given that there are likely to be way more low threshold modification calls than low non-modification calls, this will artificially inflate modification calling percentages while being invisible to anyone who doesn't carefully analyze the results.