GitHub - markmipt/protein_inference_using_DirectMS1: Improving the protein inference from bottom-up proteomic data using protein identifications from MS1 spectra

About the project

This repository contains a tutorial and supporting information for the project "Improving the protein inference from bottom-up proteomic data using protein identifications from MS1 spectra". The project details are expected to be published in the manuscript currently under review.

Fake protein database generation

This repository contains a simple iPython notebook which creates fake protein homologs. These homologs can be used for estimation of efficiency of protein inference algorithms. Additionally, the Swiss-Prot human database extended with fake protein homologs was uploaded. These fake proteins are the target proteins with 1%, 5% and 25% of replaced amino acids. Also, the fake protein databases extended with decoys proteins in either reversed or shuffled forms are uploaded.

Tutorial for usage of MS1 spectra for improving the protein inference algorithms:

Firstly, the DirectMS1 search engine (ms1searchpy) should be installed. Detailed instructions on how to install and use ms1searchpy are provided here: https://github.com/markmipt/ms1searchpy .

Simple example of usage is:

ms1searchpy /home/mark/test.mzML -d /home/mark/sprot_human_with_fakes.fasta -ad 1

Comment: “-ad 1” command creates a shuffled decoy database for FDR estimation.

The output of ms1searchpy contains multiple tables and among them is table called *_proteins_full_noexclusion.tsv. This table contains protein scores ("score" column) calculated by DirectMS1 workflow and these scores are used in the algorithm described in the manuscript.

The method was implemented in the Scavager post-search utility in combination with parsimony principle for protein inference. Detailed instructions on how to install and use Scavager are provided here: https://github.com/markmipt/scavager .

Simple example of usage is:

scavager path_to_pepXML/MZID -ms1 path_to_DirectMS1_proteins_full_noexclusion.tsv

The output of Scavager contains multiple tables and among them is table called *protein_groups.tsv where chosen protein group leaders are marked with column "groupleader". The analysis is done using the extended parsimony algorithm described in the manuscript if "-ms1" option was used. Scavager can be applied to the output of multiple search engines. Currently supported search engines: IdentiPy, X!Tandem, Comet, MSFragger, MSGF+ and Morpheus.

Python implementation of the parsimony+DirectMS1 algorithm:

This repository contains an iPython notebook with Python function for the proposed method. This code is provided mostly for an advanced users and developers who are interested in their own implementation of the proposed algorithm.

Links

Mailing list: markmipt@gmail.com
DirectMS1 repo: https://github.com/markmipt/ms1searchpy
Scavager repo: https://github.com/markmipt/scavager

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Create_fasta_file_with_FAKE_proteins.ipynb		Create_fasta_file_with_FAKE_proteins.ipynb
LICENSE		LICENSE
Parsimony+DirectMS1_Python_function.ipynb		Parsimony+DirectMS1_Python_function.ipynb
README.md		README.md
swiss_prot_human_and_3_group_of_fakes.fasta		swiss_prot_human_and_3_group_of_fakes.fasta
swiss_prot_human_and_3_group_of_fakes_with_reversed_decoys.fasta		swiss_prot_human_and_3_group_of_fakes_with_reversed_decoys.fasta
swiss_prot_human_and_3_group_of_fakes_with_shuffled_decoys.fasta		swiss_prot_human_and_3_group_of_fakes_with_shuffled_decoys.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About the project

Fake protein database generation

Tutorial for usage of MS1 spectra for improving the protein inference algorithms:

Python implementation of the parsimony+DirectMS1 algorithm:

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About the project

Fake protein database generation

Tutorial for usage of MS1 spectra for improving the protein inference algorithms:

Python implementation of the parsimony+DirectMS1 algorithm:

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages