An annotated corpus from biomedical articles to construct a drug-food interaction database

Autor: Siun Kim, Yoona Choi, Jung-Hyun Won, Jung Mi Oh, Howard Lee
Rok vydání: 2022
Předmět:
Zdroj: Journal of Biomedical Informatics. 126:103985
ISSN: 1532-0464
DOI: 10.1016/j.jbi.2022.103985
Popis: While drug-food interaction (DFI) may undermine the efficacy and safety of drugs, DFI detection has been difficult because a well-organized database for DFI did not exist. To construct a DFI database and build a natural language processing system extracting DFI from biomedical articles, we formulated the DFI extraction tasks and manually annotated texts that could have contained DFI information. In this article, we introduced a new annotated corpus for extracting DFI, the DFI corpus.The DFI corpus contains 2270 abstracts of biomedical articles accessible through PubMed and 2498 sentences that contain DFI and/or drug-drug information (DDI), a substantial amount of information about drug/food entities, evidence-levels of abstracts and relations between named entities. BERT models pre-trained on the biomedical domain achieved a F1 score 55.0% in extracting DFI key-sentences. To the best of our knowledge, the DFI corpus is the largest public corpus for drug-food interaction.Our corpus is available at https://github.com/ccadd-snu/corpus-for-DFI-extraction.
Databáze: OpenAIRE