Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization

Autor: Rafael Muñoz, Óscar Ferrández, Sergio Ferrández, Estela Saquete, Patricio Martínez-Barco
Rok vydání: 2008
Předmět:
Zdroj: Information Sciences. 178:3319-3332
ISSN: 0020-0255
DOI: 10.1016/j.ins.2008.05.002
Popis: This paper presents an improvement in the temporal expression (TE) recognition phase of a knowledge based system at a multilingual level. For this purpose, the combination of different approaches applied to the recognition of temporal expressions are studied. In this work, for the recognition task, a knowledge based system that recognizes temporal expressions and had been automatically extended to other languages (TERSEO system) was combined with a system that recognizes temporal expressions using machine learning techniques. In particular, two different techniques were applied: maximum entropy model (ME) and hidden Markov model (HMM), using two different types of tagging of the training corpus: (1) BIO model tagging of literal temporal expressions and (2) BIO model tagging of simple patterns of temporal expressions. Each system was first evaluated independently and then combined in order to: (a) analyze if the combination gives better results without increasing the number of erroneous expressions in the same percentage and (b) decide which machine learning approach performs this task better. When the TERSEO system is combined with the maximum entropy approach the best results for F-measure (89%) are obtained, improving TERSEO recognition by 4.5 points and ME recognition by 7.
Databáze: OpenAIRE