Detection of m6A from direct RNA sequencing using a multiple instance learning framework.

Autor: Hendra C; Institute of Data Science, National University of Singapore, Singapore, Singapore.; Genome Institute of Singapore, A*STAR, Singapore, Singapore.; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore., Pratanwanich PN; Genome Institute of Singapore, A*STAR, Singapore, Singapore.; Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Chulalongkorn, Thailand.; Chula Intelligent and Complex Systems Research Unit, Chulalongkorn University, Chulalongkorn, Thailand., Wan YK; Genome Institute of Singapore, A*STAR, Singapore, Singapore.; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore., Goh WSS; Institute of Molecular Physiology, Shenzhen Bay Laboratory, Shenzhen, China., Thiery A; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore. a.h.thiery@nus.edu.sg., Göke J; Genome Institute of Singapore, A*STAR, Singapore, Singapore. gokej@gis.a-star.edu.sg.; Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore. gokej@gis.a-star.edu.sg.; National Cancer Center of Singapore, Singapore, Singapore. gokej@gis.a-star.edu.sg.
Jazyk: angličtina
Zdroj: Nature methods [Nat Methods] 2022 Dec; Vol. 19 (12), pp. 1590-1598. Date of Electronic Publication: 2022 Nov 10.
DOI: 10.1038/s41592-022-01666-1
Abstrakt: RNA modifications such as m6A methylation form an additional layer of complexity in the transcriptome. Nanopore direct RNA sequencing can capture this information in the raw current signal for each RNA molecule, enabling the detection of RNA modifications using supervised machine learning. However, experimental approaches provide only site-level training data, whereas the modification status for each single RNA molecule is missing. Here we present m6Anet, a neural-network-based method that leverages the multiple instance learning framework to specifically handle missing read-level modification labels in site-level training data. m6Anet outperforms existing computational methods, shows similar accuracy as experimental approaches, and generalizes with high accuracy to different cell lines and species without retraining model parameters. In addition, we demonstrate that m6Anet captures the underlying read-level stoichiometry, which can be used to approximate differences in modification rates. Overall, m6Anet offers a tool to capture the transcriptome-wide identification and quantification of m6A from a single run of direct RNA sequencing.
(© 2022. The Author(s).)
Databáze: MEDLINE