A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

Autor:	Jill Goslinga, Timothy J. Griffin, Sean L. Seymour, Thomas McGowan, Pratik D. Jagtap, Joel A. Kooren, Matthew S. Wroblewski
Jazyk:	angličtina
Rok vydání:	2013
Předmět:	Proteomics Matching (statistics) Biology computer.software_genre Biochemistry Sensitivity and Specificity Article Search engine Tandem Mass Spectrometry Humans Database search engine Sensitivity (control systems) Amino Acid Sequence Saliva Databases Protein Molecular Biology Peptide sequence Expressed Sequence Tags business.industry Mouth Mucosa Pattern recognition Genomics Proteogenomics Search Engine Metagenomics Metaproteomics Metagenome Data mining Artificial intelligence business Peptides computer Algorithms Software
Popis:	Large databases (> 106 sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to tandem MS data using database-search programs. Most notably, strict filtering to avoid false positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches (PSMs) were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::223071b57522eace678254496e1e6966 https://europepmc.org/articles/PMC3633484/ Zobrazit plný text záznamu