Auditing subtype inconsistencies among gene ontology concepts
Autor: | Rashmie Abeysinghe, Hunter N. B. Moseley, Licong Cui, Eugene W. Hinderer |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Computer science business.industry Gene ontology Inference Audit computer.software_genre Terminology 03 medical and health sciences 030104 developmental biology Controlled vocabulary Artificial intelligence Related gene Set (psychology) business computer Natural language processing Word (computer architecture) |
Zdroj: | BIBM |
Popis: | Gene Ontology (GO) provides a controlled vocabulary for describing genes and related gene products. Quality assurance of Gene ontology (GO) is a vital aspect of the terminology management lifecycle. In this paper, we introduce a lexical-based inference approach to detecting subtype (or isa) inconsistencies among GO terms (i.e., biological concepts). We first model the name of each concept as a set of words. Then, we generate hierarchically linked and unlinked pairs of concepts (A, B), where A and B have the same number of words, and contain common words as well as a single different word. Each linked concept-pair infers a linked term-pair, and each unlinked concept-pair infers an unlinked term-pair. A term-pair appearing as both linked and unlinked is considered a potential inconsistency, which may represent a subtype inconsistency between the original linked and unlinked concept-pair. Applying this approach to the 03/28/2017 release of GO, a total of 3,715 potential subtype inconsistencies were obtained. Evaluation of a random sample of potential inconsistencies revealed two types of potential errors: missing subtype relations and incorrect subtype relations in GO, and achieved an accuracy of 56.33% for detecting such errors. This indicates that this lexical-based inference approach using the set-of-words model is a promising way to facilitate quality improvement of GO. |
Databáze: | OpenAIRE |
Externí odkaz: |