DigChem: Identification of disease-gene-chemical relationships from Medline abstracts

Autor: Hyunju Lee, Jung-jae Kim, Jeongkyun Kim
Rok vydání: 2018
Předmět:
0301 basic medicine
Male
Databases
Factual

Disease
Alzheimer's Disease
Manual curation
Machine Learning
Database and Informatics Methods
0302 clinical medicine
Databases
Genetic

Medicine and Health Sciences
Data Mining
Biology (General)
Database Searching
Disease gene
Ecology
Prostate Cancer
Prostate Diseases
Neurodegenerative Diseases
Computational Theory and Mathematics
Neurology
Oncology
Modeling and Simulation
Information Retrieval
Identification (biology)
Female
Information Technology
Sequence Analysis
Chemically-Induced Disorders
Research Article
PubMed
Computer and Information Sciences
Neural Networks
QH301-705.5
Abstracting and Indexing
Bioinformatics
MEDLINE
Urology
Computational biology
Biology
Research and Analysis Methods
03 medical and health sciences
Cellular and Molecular Neuroscience
Deep Learning
Artificial Intelligence
Word Embedding
Mental Health and Psychiatry
Genetics
Humans
Molecular Biology
Ecology
Evolution
Behavior and Systematics

Natural Language Processing
Computational Biology
Biology and Life Sciences
Cancers and Neoplasms
Search Engine
Genitourinary Tract Tumors
030104 developmental biology
Dementia
Neural Networks
Computer

Sequence Alignment
030217 neurology & neurosurgery
Neuroscience
Zdroj: PLoS Computational Biology
PLoS Computational Biology, Vol 15, Iss 5, p e1007022 (2019)
ISSN: 1553-7358
Popis: Chemicals interact with genes in the process of disease development and treatment. Although much biomedical research has been performed to understand relationships among genes, chemicals, and diseases, which have been reported in biomedical articles in Medline, there are few studies that extract disease–gene–chemical relationships from biomedical literature at a PubMed scale. In this study, we propose a deep learning model based on bidirectional long short-term memory to identify the evidence sentences of relationships among genes, chemicals, and diseases from Medline abstracts. Then, we develop the search engine DigChem to enable disease–gene–chemical relationship searches for 35,124 genes, 56,382 chemicals, and 5,675 diseases. We show that the identified relationships are reliable by comparing them with manual curation and existing databases. DigChem is available at http://gcancer.org/digchem.
Author summary For understanding the role of chemicals in the molecular process of disease development and treatment, it is important to extract disease–gene–chemical relationships from the literature, that is, which gene and which chemical interact with each other for the development and treatment of which disease. Previous works extract binary relationships such as gene–chemical and disease–gene relationships from the literature and employ statistical measurements for deducing the triple relationship of disease–gene–chemical. Since the statistical measurements for inference often fail, we develop a deep learning model to identify the evidence sentences of disease–gene–chemical relationships from Medline abstracts. We show that the identified relationships are reliable, by comparing them with manually curated databases. We also provide a search engine called DigChem over the identified evidence sentences.
Databáze: OpenAIRE