Skip to content

Incorrect --train-fdr-initial when --trainFDR is zero #349

@sjust-seerbio

Description

@sjust-seerbio

Using Percolator 3.05 I've noticed inconsistent behavior of the --testFDR, --trainFDR, and --train-fdr-initial options. Specifically when the user specifies --trainFDR 0.0 the value of --testFDR is used to select the training set, but --train-fdr-initial is still set to 0.0.

This issue is especially problematic because users who OMIT the experimental --train-fdr-initial may still experience problems due to its implementation!

The help for these options says:

 -t <value>
 --testFDR <value>                            False discovery rate threshold for 
                                              evaluating best cross validation 
                                              result and reported end result. 
                                              Default = 0.01.
 -F <value>
 --trainFDR <value>                           False discovery rate threshold to 
                                              define positive examples in 
                                              training. Set to testFDR if 0. 
                                              Default = 0.01.
[EXPERIMENTAL FEATURE]
 --train-fdr-initial <value>                  Set the FDR threshold for the 
                                              first iteration. This is useful in 
                                              cases where the original features 
                                              do not display a good separation 
                                              between targets and decoys. In 
                                              subsequent iterations, the normal 
                                              --trainFDR will be used.

Based on this help, if --train-fdr-initial is not specified and --trainFDR is 0.0 I expect the initial training round to use the --testFDR value.

Thus, the following two commands should give (nearly) identical output, but do not!

percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Percolator version 3.05.0, Build Date Feb 18 2021 07:25:40
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Started Fri Feb 17 11:43:45 2023
Hyperparameters: selectionFdr=0, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Features:
primary xCorrLib xCorrModel LogDotProduct logWeightedDotProduct sumOfSquaredErrors weightedSumOfSquaredErrors numberOfMatchingPeaks numberOfMatchingPeaksAboveThreshold averageAbsFragmentDeltaMass averageFragmentDeltaMasses isotopeDotProduct averageAbsParentDeltaMass averageParentDeltaMass eValue deltaRT numMissedCleavage pepLength charge1 charge2 charge3 charge4 precursorMz precursorMass RTinMin 
Found 1770 PSMs
Separate target and decoy search inputs detected, using mix-max method.
Train/test set contains 901 positives and 869 negatives, size ratio=1.03682 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1:	Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
Terminating.
$ percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 --train-fdr-initial 0.01 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Percolator version 3.05.0, Build Date Feb 18 2021 07:25:40
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator-v3-05.lin --testFDR 0.01 --trainFDR 0.0 --train-fdr-initial 0.01 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Started Fri Feb 17 11:45:46 2023
Hyperparameters: selectionFdr=0, Cpos=0, Cneg=0, maxNiter=10
Reading tab-delimited input from datafile 2017dec27_overlap_dia_6b_rep1_604to616.dia.features.txt
Features:
primary xCorrLib xCorrModel LogDotProduct logWeightedDotProduct sumOfSquaredErrors weightedSumOfSquaredErrors numberOfMatchingPeaks numberOfMatchingPeaksAboveThreshold averageAbsFragmentDeltaMass averageFragmentDeltaMasses isotopeDotProduct averageAbsParentDeltaMass averageParentDeltaMass eValue deltaRT numMissedCleavage pepLength charge1 charge2 charge3 charge4 precursorMz precursorMass RTinMin 
Found 1770 PSMs
Separate target and decoy search inputs detected, using mix-max method.
Train/test set contains 901 positives and 869 negatives, size ratio=1.03682 and pi0=1
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Split 1:	Selected feature 6 as initial direction. Could separate 253 training set positives with q<0.01 in that direction.
Split 2:	Selected feature 7 as initial direction. Could separate 238 training set positives with q<0.01 in that direction.
Split 3:	Selected feature 7 as initial direction. Could separate 230 training set positives with q<0.01 in that direction.
Found 234 test set positives with q<0.01 in initial direction
Reading in data and feature calculation took 0.0392 cpu seconds or 0 seconds wall clock time.
---Training with Cpos selected by cross validation, Cneg selected by cross validation, initial_fdr=0.01, fdr=0.01
Iteration 1:	Estimated 526 PSMs with q<0.01
Iteration 2:	Estimated 548 PSMs with q<0.01
Iteration 3:	Estimated 567 PSMs with q<0.01
Iteration 4:	Estimated 577 PSMs with q<0.01
Iteration 5:	Estimated 579 PSMs with q<0.01
Iteration 6:	Estimated 585 PSMs with q<0.01
Iteration 7:	Estimated 589 PSMs with q<0.01
Iteration 8:	Estimated 593 PSMs with q<0.01
Iteration 9:	Estimated 592 PSMs with q<0.01
Iteration 10:	Estimated 592 PSMs with q<0.01
Learned normalized SVM weights for the 3 cross-validation splits:
 Split1	 Split2	 Split3	FeatureName
-0.0929	 1.2665	 1.6690	primary
 1.5243	 1.9911	 2.9649	xCorrLib
-0.6302	-2.3194	-2.3082	xCorrModel
-0.6166	-1.4702	-2.4132	LogDotProduct
 0.0685	 0.4692	 1.0871	logWeightedDotProduct
-0.3117	-1.0887	-0.2973	sumOfSquaredErrors
-3.0505	-6.1180	-6.0020	weightedSumOfSquaredErrors
-0.1808	-1.5890	-0.8709	numberOfMatchingPeaks
 0.0174	 0.5070	-1.0025	numberOfMatchingPeaksAboveThreshold
 0.0205	-0.2598	-0.1906	averageAbsFragmentDeltaMass
 0.1949	-0.2754	-0.0342	averageFragmentDeltaMasses
 0.7160	 0.3896	 0.6751	isotopeDotProduct
-0.0590	 0.0137	 0.3546	averageAbsParentDeltaMass
 0.4088	 0.4219	-0.6592	averageParentDeltaMass
 1.5175	 3.6472	 2.0877	eValue
-2.1760	-10.4656	-5.1799	deltaRT
-0.1296	-0.4784	-0.0281	numMissedCleavage
-0.1143	-0.1193	-0.1369	pepLength
 0.0000	 0.0000	 0.0000	charge1
 0.0595	-0.1910	 0.2812	charge2
-0.0595	 0.1910	-0.2812	charge3
 0.0000	 0.0000	 0.0000	charge4
 0.2912	 0.3030	 0.4887	precursorMz
 0.0403	 0.3280	 0.2303	precursorMass
-0.2539	 0.7174	 0.4379	RTinMin
-2.1827	-7.9085	-5.8299	m0
Found 393 test set PSMs with q<0.01.
Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
Selecting pi_0=0.184773
Calculating q values.
New pi_0 estimate on final list yields 507 target peptides with q<0.01.
Calculating posterior error probabilities (PEPs).
Processing took 1.4018 cpu seconds or 1 seconds wall clock time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions