An annotated corpus from biomedical articles to construct a drug-food interaction database

Autor:	Siun Kim, Yoona Choi, Jung-Hyun Won, Jung Mi Oh, Howard Lee
Rok vydání:	2022
Předmět:	Food-Drug Interactions PubMed Databases Factual Databases Pharmaceutical Data Mining Health Informatics Natural Language Processing Computer Science Applications
Zdroj:	Journal of Biomedical Informatics. 126:103985
ISSN:	1532-0464
DOI:	10.1016/j.jbi.2022.103985
Popis:	While drug-food interaction (DFI) may undermine the efficacy and safety of drugs, DFI detection has been difficult because a well-organized database for DFI did not exist. To construct a DFI database and build a natural language processing system extracting DFI from biomedical articles, we formulated the DFI extraction tasks and manually annotated texts that could have contained DFI information. In this article, we introduced a new annotated corpus for extracting DFI, the DFI corpus.The DFI corpus contains 2270 abstracts of biomedical articles accessible through PubMed and 2498 sentences that contain DFI and/or drug-drug information (DDI), a substantial amount of information about drug/food entities, evidence-levels of abstracts and relations between named entities. BERT models pre-trained on the biomedical domain achieved a F1 score 55.0% in extracting DFI key-sentences. To the best of our knowledge, the DFI corpus is the largest public corpus for drug-food interaction.Our corpus is available at https://github.com/ccadd-snu/corpus-for-DFI-extraction.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0009e23bfaa2107c0a195b50e921751a https://doi.org/10.1016/j.jbi.2022.103985 Zobrazit plný text záznamu Full Text from ScienceDirect