Evaluation of a semi-automated data extraction tool for public health literature-based reviews: Dextr

Autor: Vickie R. Walker, Charles P. Schmitt, Mary S. Wolfe, Artur J. Nowak, Kuba Kulesza, Ashley R. Williams, Rob Shin, Jonathan Cohen, Dave Burch, Matthew D. Stout, Kelly A. Shipkowski, Andrew A. Rooney
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Environment International, Vol 159, Iss , Pp 107025- (2022)
Druh dokumentu: article
ISSN: 0160-4120
DOI: 10.1016/j.envint.2021.107025
Popis: Introduction: There has been limited development and uptake of machine-learning methods to automate data extraction for literature-based assessments. Although advanced extraction approaches have been applied to some clinical research reviews, existing methods are not well suited for addressing toxicology or environmental health questions due to unique data needs to support reviews in these fields. Objectives: To develop and evaluate a flexible, web-based tool for semi-automated data extraction that: 1) makes data extraction predictions with user verification, 2) integrates token-level annotations, and 3) connects extracted entities to support hierarchical data extraction. Methods: Dextr was developed with Agile software methodology using a two-team approach. The development team outlined proposed features and coded the software. The advisory team guided developers and evaluated Dextr’s performance on precision, recall, and extraction time by comparing a manual extraction workflow to a semi-automated extraction workflow using a dataset of 51 environmental health animal studies. Results: The semi-automated workflow did not appear to affect precision rate (96.0% vs. 95.4% manual, p = 0.38), resulted in a small reduction in recall rate (91.8% vs. 97.0% manual, p
Databáze: Directory of Open Access Journals