Applying Machine Learning for High‐Performance Named‐Entity Extraction

Autor:	Rahul Sukthankar, Shumeet Baluja, Vibhu Mittal
Rok vydání:	2000
Předmět:	Named entity Computational Mathematics Artificial Intelligence Computer science business.industry Artificial intelligence Spotting Machine learning computer.software_genre business computer Natural language processing Task (project management)
Zdroj:	Computational Intelligence. 16:586-595
ISSN:	1467-8640 0824-7935
DOI:	10.1111/0824-7935.00129
Popis:	This paper describes a machine learning approach to build an ecien t, accurate and fast name spotting system. Finding names in free text is an important task in addressing real-world textbased applications. Most previous approaches have been based on carefully hand-crafted modules encoding linguistic knowledge specic to the language and document genre. Such approaches have two drawbacks: they require large amounts of time and linguistic expertise to develop, and they are not easily portable to new languages and genres. This paper describes an extensible system which automatically combines weak evidence for name extraction. This evidence is gathered from easily available sources: part-of-speech tagging, dictionary lookups, and textual information such as capitalization and punctuation. Individually, each piece of evidence is insucien t for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1c04513ef34f4912947d035543a86412 https://doi.org/10.1111/0824-7935.00129 Zobrazit plný text záznamu Plný text ve formátu PDF