Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization
Autor: | Rafael Muñoz, Óscar Ferrández, Sergio Ferrández, Estela Saquete, Patricio Martínez-Barco |
---|---|
Rok vydání: | 2008 |
Předmět: |
Normalization (statistics)
Information Systems and Management business.industry Computer science Principle of maximum entropy Pattern recognition Machine learning computer.software_genre Expression (mathematics) Computer Science Applications Theoretical Computer Science Knowledge-based systems Artificial Intelligence Control and Systems Engineering Artificial intelligence Hidden Markov model business computer Software |
Zdroj: | Information Sciences. 178:3319-3332 |
ISSN: | 0020-0255 |
DOI: | 10.1016/j.ins.2008.05.002 |
Popis: | This paper presents an improvement in the temporal expression (TE) recognition phase of a knowledge based system at a multilingual level. For this purpose, the combination of different approaches applied to the recognition of temporal expressions are studied. In this work, for the recognition task, a knowledge based system that recognizes temporal expressions and had been automatically extended to other languages (TERSEO system) was combined with a system that recognizes temporal expressions using machine learning techniques. In particular, two different techniques were applied: maximum entropy model (ME) and hidden Markov model (HMM), using two different types of tagging of the training corpus: (1) BIO model tagging of literal temporal expressions and (2) BIO model tagging of simple patterns of temporal expressions. Each system was first evaluated independently and then combined in order to: (a) analyze if the combination gives better results without increasing the number of erroneous expressions in the same percentage and (b) decide which machine learning approach performs this task better. When the TERSEO system is combined with the maximum entropy approach the best results for F-measure (89%) are obtained, improving TERSEO recognition by 4.5 points and ME recognition by 7. |
Databáze: | OpenAIRE |
Externí odkaz: |