Detecting sequence signals in targeting peptides using deep learning.

Autor: Almagro Armenteros JJ; Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongen Lyngby, Denmark., Salvatore M; Science for Life Laboratory, Solna, Sweden.; Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden., Emanuelsson O; Science for Life Laboratory, Solna, Sweden.; Department of Gene Technology, School of Engineering Sciences in Biotechnology, Chemistry and Health, KTH-Royal Institute of Technology, Stockholm, Sweden., Winther O; DTU Compute, Technical University of Denmark, Kongen Lyngby, Denmark.; Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.; Centre for Genomic Medicine, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark., von Heijne G; Science for Life Laboratory, Solna, Sweden.; Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden., Elofsson A; Science for Life Laboratory, Solna, Sweden arne@bioinfo.se.; Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden., Nielsen H; Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kongen Lyngby, Denmark henni@dtu.dk.
Jazyk: angličtina
Zdroj: Life science alliance [Life Sci Alliance] 2019 Sep 30; Vol. 2 (5). Date of Electronic Publication: 2019 Sep 30 (Print Publication: 2019).
DOI: 10.26508/lsa.201900429
Abstrakt: In bioinformatics, machine learning methods have been used to predict features embedded in the sequences. In contrast to what is generally assumed, machine learning approaches can also provide new insights into the underlying biology. Here, we demonstrate this by presenting TargetP 2.0, a novel state-of-the-art method to identify N-terminal sorting signals, which direct proteins to the secretory pathway, mitochondria, and chloroplasts or other plastids. By examining the strongest signals from the attention layer in the network, we find that the second residue in the protein, that is, the one following the initial methionine, has a strong influence on the classification. We observe that two-thirds of chloroplast and thylakoid transit peptides have an alanine in position 2, compared with 20% in other plant proteins. We also note that in fungi and single-celled eukaryotes, less than 30% of the targeting peptides have an amino acid that allows the removal of the N-terminal methionine compared with 60% for the proteins without targeting peptide. The importance of this feature for predictions has not been highlighted before.
(© 2019 Armenteros et al.)
Databáze: MEDLINE