Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles

Autor:	Jelena Fiosina, Stefan Bonn, Maksims Fiosins
Přispěvatelé:	Cai, Zhipeng, Skums, Pavel, Li, Min
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences 0301 basic medicine Computer Science - Machine Learning Small RNA Computer science Quantitative Biology - Quantitative Methods Machine Learning (cs.LG) 03 medical and health sciences Annotation 0302 clinical medicine Text mining Quantitative Biology - Genomics Quantitative Methods (q-bio.QM) Genomics (q-bio.GN) business.industry Deep learning Pattern recognition Unstructured data Expression (mathematics) Random forest ComputingMethodologies_PATTERNRECOGNITION 030104 developmental biology Expression data FOS: Biological sciences Artificial intelligence business 030217 neurology & neurosurgery
Zdroj:	Cham : Springer International Publishing, Lecture Notes in Computer Science 11490, 159-170 (2019). doi:10.1007/978-3-030-20242-2_14 Bioinformatics Research and Applications / Cai, Zhipeng (Editor) ; Cham : Springer International Publishing, 2019, Chapter 14 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-030-20241-5=978-3-030-20242-2 ; doi:10.1007/978-3-030-20242-2 Bioinformatics Research and Applications / Cai, Zhipeng (Editor) ; Cham : Springer International Publishing, 2019, Chapter 14 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-030-20241-5=978-3-030-20242-2 ; doi:10.1007/978-3-030-20242-2International Symposium on Bioinformatics Research and Applications Bioinformatics Research and Applications ISBN: 9783030202415
DOI:	10.1007/978-3-030-20242-2_14
Popis:	The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly - used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The "one dataset out" average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for 'unseen' datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a268ea6937eb7be29282f2b0f172d218 Zobrazit plný text záznamu