MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs.

Autor: Bui VM; Department of Computer Science and Engineering and., Lu CT; Department of Computer Science and Engineering and., Ho TT; Department of Computer Science and Engineering and., Lee TY; Department of Computer Science and Engineering and Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan.
Jazyk: angličtina
Zdroj: Bioinformatics (Oxford, England) [Bioinformatics] 2016 Jan 15; Vol. 32 (2), pp. 165-72. Date of Electronic Publication: 2015 Sep 26.
DOI: 10.1093/bioinformatics/btv558
Abstrakt: Unlabelled: S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs.
Availability and Implementation: The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/. All of the data set used in this work is also available for download in the website.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Contact: francis@saturn.yzu.edu.tw.
(© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
Databáze: MEDLINE