-
Notifications
You must be signed in to change notification settings - Fork 45
Closed
Description
Using Percolator 3.05 I've noticed inconsistent behavior of the --testFDR, --trainFDR, and --train-fdr-initial options. Specifically when the user specifies --trainFDR 0.0 the value of --testFDR is used to select the training set, but --train-fdr-initial is still set to 0.0.
This issue is especially problematic because users who OMIT the experimental --train-fdr-initial may still experience problems due to its implementation!
The help for these options says:
-t <value>
--testFDR <value> False discovery rate threshold for
evaluating best cross validation
result and reported end result.
Default = 0.01.
-F <value>
--trainFDR <value> False discovery rate threshold to
define positive examples in
training. Set to testFDR if 0.
Default = 0.01.
[EXPERIMENTAL FEATURE]
--train-fdr-initial <value> Set the FDR threshold for the
first iteration. This is useful in
cases where the original features
do not display a good separation
between targets and decoys. In
subsequent iterations, the normal
--trainFDR will be used.
Based on this help, if --train-fdr-initial is not specified and --trainFDR is 0.0 I expect the initial training round to use the --testFDR value.
Thus, the following two commands should give (nearly) identical output, but do not!
percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Percolator version 3.05.0, Build Date Feb 18 2021 07:25:40
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Started Fri Feb 17 11:43:45 2023
Hyperparameters: selectionFdr=0, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Features:
primary xCorrLib xCorrModel LogDotProduct logWeightedDotProduct sumOfSquaredErrors weightedSumOfSquaredErrors numberOfMatchingPeaks numberOfMatchingPeaksAboveThreshold averageAbsFragmentDeltaMass averageFragmentDeltaMasses isotopeDotProduct averageAbsParentDeltaMass averageParentDeltaMass eValue deltaRT numMissedCleavage pepLength charge1 charge2 charge3 charge4 precursorMz precursorMass RTinMin
Found 1770 PSMs
Separate target and decoy search inputs detected, using mix-max method.
Train/test set contains 901 positives and 869 negatives, size ratio=1.03682 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1: Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
Terminating.$ percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 --train-fdr-initial 0.01 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Percolator version 3.05.0, Build Date Feb 18 2021 07:25:40
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 --train-fdr-initial 0.01 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Started Fri Feb 17 11:45:46 2023
Hyperparameters: selectionFdr=0, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Features:
primary xCorrLib xCorrModel LogDotProduct logWeightedDotProduct sumOfSquaredErrors weightedSumOfSquaredErrors numberOfMatchingPeaks numberOfMatchingPeaksAboveThreshold averageAbsFragmentDeltaMass averageFragmentDeltaMasses isotopeDotProduct averageAbsParentDeltaMass averageParentDeltaMass eValue deltaRT numMissedCleavage pepLength charge1 charge2 charge3 charge4 precursorMz precursorMass RTinMin
Found 1770 PSMs
Separate target and decoy search inputs detected, using mix-max method.
Train/test set contains 901 positives and 869 negatives, size ratio=1.03682 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1: Selected feature 6 as initial direction. Could separate 253 training set positives with q<0.01 in that direction.
Split 2: Selected feature 7 as initial direction. Could separate 238 training set positives with q<0.01 in that direction.
Split 3: Selected feature 7 as initial direction. Could separate 230 training set positives with q<0.01 in that direction.
Found 234 test set positives with q<0.01 in initial direction
Reading in data and feature calculation took 0.0392 cpu seconds or 0 seconds wall clock time.
---Training with Cpos selected by cross validation, Cneg selected by cross validation, initial_fdr=0.01, fdr=0.01
Iteration 1: Estimated 526 PSMs with q<0.01
Iteration 2: Estimated 548 PSMs with q<0.01
Iteration 3: Estimated 567 PSMs with q<0.01
Iteration 4: Estimated 577 PSMs with q<0.01
Iteration 5: Estimated 579 PSMs with q<0.01
Iteration 6: Estimated 585 PSMs with q<0.01
Iteration 7: Estimated 589 PSMs with q<0.01
Iteration 8: Estimated 593 PSMs with q<0.01
Iteration 9: Estimated 592 PSMs with q<0.01
Iteration 10: Estimated 592 PSMs with q<0.01
Learned normalized SVM weights for the 3 cross-validation splits:
Split1 Split2 Split3 FeatureName
-0.0929 1.2665 1.6690 primary
1.5243 1.9911 2.9649 xCorrLib
-0.6302 -2.3194 -2.3082 xCorrModel
-0.6166 -1.4702 -2.4132 LogDotProduct
0.0685 0.4692 1.0871 logWeightedDotProduct
-0.3117 -1.0887 -0.2973 sumOfSquaredErrors
-3.0505 -6.1180 -6.0020 weightedSumOfSquaredErrors
-0.1808 -1.5890 -0.8709 numberOfMatchingPeaks
0.0174 0.5070 -1.0025 numberOfMatchingPeaksAboveThreshold
0.0205 -0.2598 -0.1906 averageAbsFragmentDeltaMass
0.1949 -0.2754 -0.0342 averageFragmentDeltaMasses
0.7160 0.3896 0.6751 isotopeDotProduct
-0.0590 0.0137 0.3546 averageAbsParentDeltaMass
0.4088 0.4219 -0.6592 averageParentDeltaMass
1.5175 3.6472 2.0877 eValue
-2.1760 -10.4656 -5.1799 deltaRT
-0.1296 -0.4784 -0.0281 numMissedCleavage
-0.1143 -0.1193 -0.1369 pepLength
0.0000 0.0000 0.0000 charge1
0.0595 -0.1910 0.2812 charge2
-0.0595 0.1910 -0.2812 charge3
0.0000 0.0000 0.0000 charge4
0.2912 0.3030 0.4887 precursorMz
0.0403 0.3280 0.2303 precursorMass
-0.2539 0.7174 0.4379 RTinMin
-2.1827 -7.9085 -5.8299 m0
Found 393 test set PSMs with q<0.01.
Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
Selecting pi_0=0.184773
Calculating q values.
New pi_0 estimate on final list yields 507 target peptides with q<0.01.
Calculating posterior error probabilities (PEPs).
Processing took 1.4018 cpu seconds or 1 seconds wall clock time.Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels