BioCreAtIvE Task1A: entity identification with a stochastic tagger

Autor:	Kinoshita Shuhei, Cohen K Bretonnel, Ogren Philip V, Hunter Lawrence
Jazyk:	angličtina
Rok vydání:	2005
Předmět:	Computer applications to medicine. Medical informatics R858-859.7 Biology (General) QH301-705.5
Zdroj:	BMC Bioinformatics, Vol 6, Iss Suppl 1, p S4 (2005)
Druh dokumentu:	article
ISSN:	1471-2105
DOI:	10.1186/1471-2105-6-S1-S4
Popis:	Abstract Background Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system 12. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger 3. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI. Results Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division. Conclusion Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/8681f755c8524c10b3a71b3a8e8b0fe1 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF