Deep Learning Benchmarks on L1000 Gene Expression Data
Autor: | Jennifer P. Wang, Steven D. Sheridan, Isaac S. Kohane, Matthew B. A. McDermott, Peter Szolovits, Stephen J. Haggarty, Roy H. Perlis, Wen-Ning Zhao |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
0206 medical engineering Decision tree 02 engineering and technology Machine learning computer.software_genre Article Cell Line Data modeling Deep Learning Databases Genetic Genetics Humans Profiling (information science) Protein Interaction Maps Models Genetic Artificial neural network business.industry Gene Expression Profiling Applied Mathematics Deep learning Computational Biology Random forest ComputingMethodologies_PATTERNRECOGNITION Benchmark (computing) Artificial intelligence Transcriptome business computer Algorithms 020602 bioinformatics Biotechnology Coding (social sciences) |
Zdroj: | IEEE/ACM Trans Comput Biol Bioinform |
ISSN: | 2374-0043 1545-5963 |
DOI: | 10.1109/tcbb.2019.2910061 |
Popis: | Gene expression data can offer deep, physiological insights beyond the static coding of the genome alone. We believe that realizing this potential requires specialized, high-capacity machine learning methods capable of using underlying biological structure, but the development of such models is hampered by the lack of published benchmark tasks and well characterized baselines. In this work, we establish such benchmarks and baselines by profiling many classifiers against biologically motivated tasks on two curated views of a large, public gene expression dataset (the LINCS corpus) and one privately produced dataset. We provide these two curated views of the public LINCS dataset and our benchmark tasks to enable direct comparisons to future methodological work and help spur deep learning method development on this modality. In addition to profiling a battery of traditional classifiers, including linear models, random forests, decision trees, K nearest neighbor (KNN) classifiers, and feed-forward artificial neural networks (FF-ANNs), we also test a method novel to this data modality: graph convolugtional neural networks (GCNNs), which allow us to incorporate prior biological domain knowledge. We find that GCNNs can be highly performant, with large datasets, whereas FF-ANNs consistently perform well. Non-neural classifiers are dominated by linear models and KNN classifiers. |
Databáze: | OpenAIRE |
Externí odkaz: |