Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
Autor: | Barry A. Bunin, Stephan C. Schürer, Nadia K. Litterman, Ubbo Visser, Alex M. Clark |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
Bioinformatics
Computer science lcsh:Medicine Ontology (information science) Machine learning computer.software_genre 01 natural sciences Bayesian Computational Science General Biochemistry Genetics and Molecular Biology Domain (software engineering) Set (abstract data type) 03 medical and health sciences Annotation Software 030304 developmental biology 0303 health sciences business.industry Plain text Ontology General Neuroscience Natural language processing lcsh:R General Medicine computer.file_format Construct (python library) 0104 chemical sciences 010404 medicinal & biomolecular chemistry Semantic curation Bioassay Artificial intelligence User interface General Agricultural and Biological Sciences business computer |
Zdroj: | PeerJ, Vol 2, p e524 (2014) PeerJ |
ISSN: | 2167-8359 |
Popis: | Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic eff ect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum eff ort. We have carried out this work based on the premise that pure machine learning is insuffi ciently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an eff ective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers. |
Databáze: | OpenAIRE |
Externí odkaz: |