Submodular Generalized Matching for Peptide Identification in Tandem Mass Spectrometry
Autor: | Wenruo Bai, William Stafford Noble, Jeff A. Bilmes |
---|---|
Rok vydání: | 2019 |
Předmět: |
Proteomics
Matching (graph theory) Plasmodium falciparum 0206 medical engineering Score Value (computer science) Saccharomyces cerevisiae 02 engineering and technology Matroid Article Submodular set function Combinatorics Tandem Mass Spectrometry Genetics Animals Humans Caenorhabditis elegans Databases Protein Greedy algorithm Mathematics Models Statistical Applied Mathematics Computational Biology Maximization Calibration Bipartite graph Peptides Algorithms Software 020602 bioinformatics Biotechnology |
Zdroj: | IEEE/ACM Trans Comput Biol Bioinform |
ISSN: | 2374-0043 1545-5963 |
DOI: | 10.1109/tcbb.2018.2822280 |
Popis: | Motivation: Identification of spectra produced by a shotgun proteomics mass spectrometry experiment is commonly performed by searching the observed spectra against a peptide database. The heart of this search procedure is a score function that evaluates the quality of a hypothesized match between an observed spectrum and a theoretical spectrum corresponding to a particular peptide sequence. Accordingly, the success of a spectrum analysis pipeline depends critically upon this peptide-spectrum score function. We develop peptide-spectrum score functions that compute the maximum value of a submodular function under $m$ m matroid constraints. We call this procedure a submodular generalized matching (SGM) since it generalizes bipartite matching. We use a greedy algorithm to compute maximization, which can achieve a solution whose objective is guaranteed to be at least $\frac{1}{1+m}$ 1 1 + m of the true optimum. The advantage of the SGM framework is that known long-range properties of experimental spectra can be modeled by designing suitable submodular functions and matroid constraints. Experiments on four data sets from various organisms and mass spectrometry platforms show that the SGM approach leads to significantly improved performance compared to several state-of-the-art methods. Supplementary information, C++ source code, and data sets can be found at https://melodi-lab.github.io/SGM. |
Databáze: | OpenAIRE |
Externí odkaz: |