ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis
Autor: | Kafando, Rodrique, Decoupes, Rémy, Valentin, Sarah, Sautot, Lucile, Teisseire, Maguelonne, Roche, Mathieu |
---|---|
Přispěvatelé: | Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Animal, Santé, Territoires, Risques et Ecosystèmes (UMR ASTRE), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Département Environnements et Sociétés (Cirad-ES), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), This study was partially funded by EU grant 874850 MOOD and is catalogued as MOOD 003. The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission. This study was partially funded by the French Agricultural Research Centre for International Development (CIRAD) the French General Directorate for Food (DGAL), and the SONGES Project (FEDER and Occitanie Region). The research was also supported by the French National Research Agency (ANR) under the Investments for the Future Program, referred to as ANR-16-CONV-0004, #DigitAg., ANR-16-CONV-0004,DIGITAG,Institut Convergences en Agriculture Numérique(2016) |
Jazyk: | angličtina |
Předmět: |
Biomedical terminology
U10 - Informatique mathématiques et statistiques [SDV]Life Sciences [q-bio] Research COVID-19 Terminologie Fouille de textes Intelligent analysis C30 - Documentation et information Sciences médicales S50 - Santé humaine [SDE]Environmental Sciences Terminology extraction [INFO]Computer Science [cs] |
Zdroj: | Health Information Science and Systems Health Information Science and Systems, BioMed Central, 2021, 9 (1), pp.29. ⟨10.1007/s13755-021-00156-6⟩ Health Information Science and Systems, BioMed Central, In press |
ISSN: | 2047-2501 |
DOI: | 10.1007/s13755-021-00156-6 |
Popis: | International audience; Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain. |
Databáze: | OpenAIRE |
Externí odkaz: |