A BIOINFORMATIC TOOL FOR ANALYSING THE STRUCTURES OF PROTEIN COMPLEXES BY MEANS OF MASS SPECTROMETRY OF CROSS-LINKED PROTEINS

Autor: Mayne, Shannon LN
Rok vydání: 2014
Předmět:
Druh dokumentu: Text
Popis: Multi-subunit protein complexes are involved in many essential biochemical processes including signal transduction, protein synthesis, RNA synthesis, DNA replication and protein degradation. An accurate description of the relative structural arrangement of the constituent sub-units in such complexes is crucial for an understanding of the molecular mechanism of the complex as a whole. Many complexes, however, lie in the mega-Dalton range, and are not amenable to X-ray crystallographic or Nuclear Magnetic Resonance analysis. Techniques that are suited to structural studies of such large complexes, such as cryo-electron microscopy, do not provide the resolution required for a mechanistic insight. Mass spectrometry (MS) has increasingly been applied to identify the residues that are involved in chemical cross-links in compound protein assemblies, and have provided valuable insight into the molecular arrangement, orientation and contact surfaces of sub-units within such large complexes. This approach is known as MS3D, and involves the MS analysis of cross-linked di-peptides following the enzymatic cleavage of a chemically cross-linked complex. A major challenge of this approach is the identification of the cross-linked di-peptides in a composite mixture of peptides, as well as the identification of the residues involved in the cross-link. These analyses require bioinformatics tools with capabilities beyond that of general, MS-based proteomic analysis software. Many MS3D software tools have appeared, often designed for very specific experimental methods. We review all major MS3D bioinformatics programs currently available, considering their applicability to different workflows, specific experimental requirements, and the computational approach taken by each. We also developed AnchorMS, a new bioinformatics tool for the identification of both the sequences and cross-linked residues of di-peptides within a post-digest peptide mixture based on MS1 and MS2 data. AnchorMS is intended as a component in the workflow of an MS3D experiment where the protein sequences, cross-linking reagent and protease are known. AnchorMS is freely available as a public web service at cbio.ufs.ac.za/AnchorMS via a simple, user-friendly web interface coded in PHP/XHTML. Experimental sample preparation information and MS data may be uploaded through the web form and analysed by AnchorMS. After analysis, the web interface displays the di-peptides detected, as well as the calculated maximum inter-residue distance between crosslinked residues. This distance information can be used in the optimization of sub-unit positioning within structural models using third party software. The computational core of AnchorMS was developed as an open-source Python project. We describe in detail the overall structure and workflow of the code as well as the functionality implemented in each section of the code. AnchorMS creates a digital library of possible di-peptides and generates expected precursor and fragment mass spectra for each. In order to identify di-peptides, the observed mass spectra are matched against the library of expected mass spectra. Features that are unique to AnchorMS are highlighted, including those for the analysis of di-peptides where the sequences are identical, but the cross-linked residues differ. AnchorMS considers their possible co-fragmentation and employs a specialised second score for distinguishing between such precursors. A unique mathematical model for estimating the level of false positive matching was derived based on an in silico simulation of false positive spectrum matching using randomly generated di-peptide sequences. Subsets of the simulation data were modelled using disparate functions, which were subsequently combined to yield a composite model that described expected false matching under various conditions. The refined calibration of this model against simulation data was performed using the R programming language. AnchorMS also implemented this model as a dynamic false positive threshold, where score values greater than the threshold were considered likely to be true spectrum matches.
Databáze: Networked Digital Library of Theses & Dissertations