Sequential interval motif search: unrestricted database surveys of global MS/MS data sets for detection of putative post-translational modifications.

Autor: Liu J; Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada., Erassov A, Halina P, Canete M, Nguyen DV, Chung C, Cagney G, Ignatchenko A, Fong V, Emili A
Jazyk: angličtina
Zdroj: Analytical chemistry [Anal Chem] 2008 Oct 15; Vol. 80 (20), pp. 7846-54. Date of Electronic Publication: 2008 Sep 13.
DOI: 10.1021/ac8009017
Abstrakt: Tandem mass spectrometry is the prevailing approach for large-scale peptide sequencing in high-throughput proteomic profiling studies. Effective database search engines have been developed to identify peptide sequences from MS/MS fragmentation spectra. Since proteins are polymorphic and subject to post-translational modifications (PTM), however, computational methods for detecting unanticipated variants are also needed to achieve true proteome-wide coverage. Different from existing "unrestrictive" search tools, we present a novel algorithm, termed SIMS (for Sequential Motif Interval Search), that interprets pairs of product ion peaks, representing potential amino acid residues or "intervals", as a means of mapping PTMs or substitutions in a blind database search mode. An effective heuristic software program was likewise developed to evaluate, rank, and filter optimal combinations of relevant intervals to identify candidate sequences, and any associated PTM or polymorphism, from large collections of MS/MS spectra. The prediction performance of SIMS was benchmarked extensively against annotated reference spectral data sets and compared favorably with, and was complementary to, current state-of-the-art methods. An exhaustive discovery screen using SIMS also revealed thousands of previously overlooked putative PTMs in a compendium of yeast protein complexes and in a proteome-wide map of adult mouse cardiomyocytes. We demonstrate that SIMS, freely accessible for academic research use, addresses gaps in current proteomic data interpretation pipelines, improving overall detection coverage, and facilitating comprehensive investigations of the fundamental multiplicity of the expressed proteome.
Databáze: MEDLINE