Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites
Autor: | Bastien Rance, David Baudoin, Maxime Wack, Antoine Neuraz, William Digan, Aurélie Névéol, Anita Burgun |
---|---|
Přispěvatelé: | Département d'Informatique et Santé Publique [CHU HEGP] (HEGP - Informatique), Hôpital Européen Georges Pompidou [APHP] (HEGP), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpitaux Universitaires Paris Ouest - Hôpitaux Universitaires Île de France Ouest (HUPO)-Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpitaux Universitaires Paris Ouest - Hôpitaux Universitaires Île de France Ouest (HUPO), Laboratoire Interdisciplinaire des Sciences du Numérique (LISN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Neuraz, Antoine, Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpitaux Universitaires Paris Ouest - Hôpitaux Universitaires Île de France Ouest (HUPO), Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138)), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité), CHU Necker - Enfants Malades [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP), Service d'informatique médicale et biostatistiques [CHU Necker], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-CHU Necker - Enfants Malades [AP-HP], CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Reproducibility of results Containerization AcademicSubjects/SCI01060 workflow Computer science [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing Health Informatics Research and Applications computer.software_genre Health informatics Field (computer science) Set (abstract data type) 03 medical and health sciences 0302 clinical medicine 030212 general & internal medicine AcademicSubjects/MED00580 ComputingMilieux_MISCELLANEOUS Meaningful use Data stream mining business.industry Natural language processing Computational Biology Grid Pipeline (software) Workflow [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing 030104 developmental biology Workflow Database Management Systems Artificial intelligence AcademicSubjects/SCI01530 business computer Medical Informatics Workflow management system |
Zdroj: | Journal of the American Medical Informatics Association Journal of the American Medical Informatics Association, 2020, ⟨10.1093/jamia/ocaa261⟩ Journal of the American Medical Informatics Association, 2021, 28 (3), pp.504-515. ⟨10.1093/jamia/ocaa261⟩ Journal of the American Medical Informatics Association, BMJ Publishing Group, 2020, ⟨10.1093/jamia/ocaa261⟩ Journal of the American Medical Informatics Association : JAMIA |
ISSN: | 1067-5027 1527-974X |
DOI: | 10.1093/jamia/ocaa261⟩ |
Popis: | Background The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. Objective To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. Materials and Methods Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. Results We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. Discussion 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. Conclusion NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting. |
Databáze: | OpenAIRE |
Externí odkaz: |