Human-Machine Information Extraction Simulator for Biological Collections

Autor:	Andrea Matsunaga, Icaro Alzuru, A B Fortes Jose, Aditi Malladi, Mauricio Tsugawa
Rok vydání:	2019
Předmět:	0106 biological sciences business.industry Computer science 010604 marine biology & hydrobiology Optical character recognition computer.software_genre Crowdsourcing 010603 evolutionary biology 01 natural sciences Personalization Metadata Information extraction Workflow business computer Simulation
Zdroj:	IEEE BigData
DOI:	10.1109/bigdata47090.2019.9005601
Popis:	In the last decade, institutions from around the world have implemented initiatives for digitizing biological collections (biocollections) and sharing their information online. The transcription of the metadata from photographs of specimens’ labels is performed through human-centered approaches (e.g., crowdsourcing) because fully automated Information Extraction (IE) methods still generate a significant number of errors. The integration of human and machine tasks has been proposed to accelerate the IE from the billions of specimens waiting to be digitized. Nevertheless, in order to conduct research and trying new techniques, IE practitioners need to prepare sets of images, crowdsourcing experiments, recruit volunteers, process the transcriptions, generate ground truth values, program automated methods, etc. These research resources and processes require time and effort to be developed and architected into a functional system. In this paper, we present a simulator intended to accelerate the ability to experiment with workflows for extracting Darwin Core (DC) terms from images of specimens. The so-called HuMaIN Simulator includes the engine, the human-machine IE workflows for three DC terms, the code of the automated IE methods, crowdsourced and ground truth transcriptions of the DC terms of three biocollections, and several experiments that exemplify its potential use. The simulator adds Human-in-the-loop capabilities, for iterative IE and research on optimal methods. Its practical design permits the quick definition, customization, and implementation of experimental IE scenarios.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2e8e1aee2cc7224baba2022197b9826e https://doi.org/10.1109/bigdata47090.2019.9005601 Zobrazit plný text záznamu