Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach
Autor: | Jian Wang, Jie Hao, Pei-Yau Lung, Dongrui Zhong, Tingting Zhao, Jinfeng Zhang, Zhe He, Albert Steppi, Jinchan Qu |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Word embedding
Text mining lcsh:QH426-470 Protein-protein interactions lcsh:Biotechnology Biology computer.software_genre Machine learning Biological effect Task (project management) Machine Learning 03 medical and health sciences 0302 clinical medicine Protein interactions affected by mutations lcsh:TP248.13-248.65 Protein Interaction Mapping Genetics Data Mining 030304 developmental biology Natural Language Processing 0303 health sciences Biomedical literature retrieval business.industry Deep learning Methodology Article Precision medicine Triage lcsh:Genetics 030220 oncology & carcinogenesis Mutation Artificial intelligence business computer Natural language processing Mutations Biotechnology |
Zdroj: | BMC Genomics, Vol 21, Iss 1, Pp 1-10 (2020) BMC Genomics |
ISSN: | 1471-2164 |
DOI: | 10.1186/s12864-020-07185-7 |
Popis: | Background Information on protein-protein interactions affected by mutations is very useful for understanding the biological effect of mutations and for developing treatments targeting the interactions. In this study, we developed a natural language processing (NLP) based machine learning approach for extracting such information from literature. Our aim is to identify journal abstracts or paragraphs in full-text articles that contain at least one occurrence of a protein-protein interaction (PPI) affected by a mutation. Results Our system makes use of latest NLP methods with a large number of engineered features including some based on pre-trained word embedding. Our final model achieved satisfactory performance in the Document Triage Task of the BioCreative VI Precision Medicine Track with highest recall and comparable F1-score. Conclusions The performance of our method indicates that it is ideally suited for being combined with manual annotations. Our machine learning framework and engineered features will also be very helpful for other researchers to further improve this and other related biological text mining tasks using either traditional machine learning or deep learning based methods. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |