-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Hello.
I have a problem running percolator v3.6 on a file that worked with v3.5.
Part 1
To reproduce the issue here is a toy data set:
SpecId Label ScanNr ExpMass Mass feature_1 Peptide Protein1
34292 -1 88 1103.59885 1103.59885 0.2029 KGPYLHPHR _placeholder_
97071 -1 89 1103.59885 1103.59885 0.3172 KGPYLHPHR _placeholder_
127175 1 110 1485.78140 1485.78140 0.1040 RAHERPPPHPHR _placeholder_
181611 1 112 2469.91930 2469.91930 0.0123 STGHGGHCTNCQDNTDGAHCER _placeholder_
113801 1 239 1646.73185 1646.73185 0.0065 EHTGKPTTSSSEACR _placeholder_
237191 1 245 2041.96648 2041.96648 0.2799 KTEEERPQETTNQHSTK _placeholder_
105065 -1 249 919.46241 919.46241 0.1106 RNPASYGR _placeholder_
96772 1 256 1088.50982 1088.50982 0.0867 ESESTAAAPAR _placeholder_
96773 -1 256 1444.78989 1444.78989 0.0169 GRPPKQEPAAAAPR _placeholder_
127254 1 258 1488.77972 1488.77972 0.0047 SVQPQSHKPQPTR _placeholder_
127255 1 258 1490.72260 1490.72260 0.0685 GTHDRDPSEKPPR _placeholder_
141975 1 262 1488.77972 1488.77972 0.0040 SVQPQSHKPQPTR _placeholder_
74393 1 263 1386.68516 1386.68516 0.4173 SPEQSRSSPEKR _placeholder_
127960 1 264 1122.61456 1122.61456 0.4747 LSHPTTSRPK _placeholder_
74104 1 266 1386.68516 1386.68516 0.0389 SPEQSRSSPEKR _placeholder_
74105 -1 266 1844.87520 1844.87520 0.0768 KLKDSEETHETGAASDK _placeholder_
87069 1 267 1764.80270 1764.80270 0.0012 NRPEPHSDENGSTTPK _placeholder_
175837 1 268 2173.87443 2173.87443 0.0025 NHSGNDERDEEDEERESK _placeholder_
49377 1 269 1498.70120 1498.70120 0.0016 ESRPENEEERPK _placeholder_
The prcolator versions are percolator-v3-06-linux-amd64.deb and percolator-v3-05-linux-amd64.deb
The command I used was percolator -Y toy_data.tsv and it shows this error:
Couldn't find Protein header in tab-file
But there is a Protein header and the changelog does not mention that anything about the input format has changed. The Wiki also shows an example with "proteinId1". This header does also work with v3.5 but not with v3.6.
I looked into the code and it seems like that now the header name needs to be Proteins. I also noticed that even when you use Proteins as header but you misspelled Label as e.g. label, then you will also get the same error: Couldn't find Protein header in tab-file.
Part 2
So I changed the header and percolator works again, but now I get a lot of warnings:
...
Features:
Mass feature_1
Warning: Set decoy prefix don't match
Warning: Set decoy prefix don't match
Warning: Set decoy prefix don't match
Warning: Set decoy prefix don't match
Warning: Set decoy prefix don't match
Found 19 PSMs
...
What does this error mean? What do I need to do to get rid of it?
Also, the result changed.
with v3.5 I get:
PSMId score q-value posterior_error_prob peptide proteinIds
175837 0 0.125 0.205014 NHSGNDERDEEDEERESK _placeholder_
...
with v3.6 I get:
PSMId score q-value posterior_error_prob peptide proteinIds
175837 0 0.125 0.205014 NHSGNDERDEEDEERESK
...
The entries of the proteins are missing here.
I hope you can help me.