Deep Learning Benchmarks on L1000 Gene Expression Data

Autor:	Jennifer P. Wang, Steven D. Sheridan, Isaac S. Kohane, Matthew B. A. McDermott, Peter Szolovits, Stephen J. Haggarty, Roy H. Perlis, Wen-Ning Zhao
Rok vydání:	2020
Předmět:	Computer science 0206 medical engineering Decision tree 02 engineering and technology Machine learning computer.software_genre Article Cell Line Data modeling Deep Learning Databases Genetic Genetics Humans Profiling (information science) Protein Interaction Maps Models Genetic Artificial neural network business.industry Gene Expression Profiling Applied Mathematics Deep learning Computational Biology Random forest ComputingMethodologies_PATTERNRECOGNITION Benchmark (computing) Artificial intelligence Transcriptome business computer Algorithms 020602 bioinformatics Biotechnology Coding (social sciences)
Zdroj:	IEEE/ACM Trans Comput Biol Bioinform
ISSN:	2374-0043 1545-5963
DOI:	10.1109/tcbb.2019.2910061
Popis:	Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::93005d5d878b1714d12e4c14b887d19a https://doi.org/10.1109/tcbb.2019.2910061 Zobrazit plný text záznamu