Hybrid expert ensembles for identifying unreliable data in citizen science
Autor: | Nick J. Moran, Alison Johnston, Wenjia Wang, Pieter Wessels |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
0209 industrial biotechnology
business.industry Process (engineering) Computer science Reliability (computer networking) Decision tree Volume (computing) Data validation 02 engineering and technology Machine learning computer.software_genre Expert system Task (project management) 020901 industrial engineering & automation Artificial Intelligence Control and Systems Engineering Scientific method 0202 electrical engineering electronic engineering information engineering Citizen science 020201 artificial intelligence & image processing Artificial intelligence Electrical and Electronic Engineering business computer |
Popis: | Citizen science utilises public resources for scientific research. BirdTrack is such a project established in 2004 by the British Trust for Ornithology (BTO) for the public to log their bird observations through its web or mobile applications. It has accumulated over 40 million observations. However, the veracity of these observations needs to be checked and the current process involves time-consuming interventions by human experts. This research therefore aims to develop a more efficient system to automatically identify unreliable observations from large volume of records. This paper presents a novel approach — a Hybrid Expert Ensemble System (HEES) that combines an Expert System (ES) and machine induced models to perform the intended task. The ES is built based on human expertise and used as a base member of the ensemble. Other members are decision trees induced from county-based data. The HEES uses accuracy and diversity as criteria to select its members with an aim of improving its accuracy and reliability. The experiments were carried out using the county-based data and the results indicate that (1) the performance of the expert system is reasonable for some counties but varied considerably on others. (2) An HEES is more accurate and reliable than the Expert System and also other individual models, with Sensitivity of 85% for correctly identifying unreliable observations and Specificity of 99% for reliable observations. These results demonstrated that the proposed approach has the ability to be an alternative or additional means to validate the observations in a timely and cost-effective manner and also has a potential to be applied in other citizen science projects where the huge amount of data needs to be checked effectively and efficiently. |
Databáze: | OpenAIRE |
Externí odkaz: |