Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

Autor: Todd Lingren, Imre Solti, Laura Stoutenborough, Megan Kaiser, Qi Li, Haijun Zhai, Louise Deléger
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Quality Control
Markup language
020205 medical informatics
Web 2.0
Computer science
Health Informatics
Pilot Projects
02 engineering and technology
Crowdsourcing
computer.software_genre
JavaScript
lcsh:Computer applications to medicine. Medical informatics
03 medical and health sciences
Annotation
user computer interface
reference standards
clinical informatics
0202 electrical engineering
electronic engineering
information engineering

Humans
natural language processing
named entity
030304 developmental biology
computer.programming_language
0303 health sciences
Original Paper
Clinical Trials as Topic
Internet
business.industry
lcsh:Public aspects of medicine
Usability
lcsh:RA1-1270
Telemedicine
lcsh:R858-859.7
The Internet
Artificial intelligence
User interface
business
computer
Social Media
Natural language processing
Zdroj: Journal of Medical Internet Research
Journal of Medical Internet Research, Vol 15, Iss 4, p e73 (2013)
ISSN: 1438-8871
1439-4456
Popis: Background: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Objective: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. Methods: To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance ( P
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje