An automatic filtering method for field association words by deleting unnecessary words
Autor: | Elmarhomy Ghada, Masao Fuketa, El-Sayed Atlam, Kazuhiro Morita, Jun-ichi Aoe |
---|---|
Rok vydání: | 2006 |
Předmět: |
Stop words
Automatic filtering business.industry Computer science Applied Mathematics Speech recognition Document classification computer.software_genre Automatic summarization Field (computer science) Computer Science Applications Computational Theory and Mathematics Artificial intelligence Association (psychology) Precision and recall business computer Word (group theory) Natural language processing |
Zdroj: | International Journal of Computer Mathematics. 83:247-261 |
ISSN: | 1029-0265 0020-7160 |
DOI: | 10.1080/00207160600875234 |
Popis: | Document classification and summarization are very important for document text retrieval. Generally, humans can recognize fields such as ⟨Sports⟩ or ⟨Politics⟩ based on specific words called Field Association (FA) words in those document fields. The traditional method causes misleading redundant words (unnecessary words) to be registered because the quality of the resulting FA words depends on learning data pre-classified by hand. Therefore recall and precision of document classification are degraded if the classified fields classified by hand are ambiguous. We propose two criteria: deleting unnecessary words with low frequencies, and deleting unnecessary words using category information. Moreover, using the proposed criteria unnecessary words can be deleted from the FA words dictionary created by the traditional method. Experimental results showed that 25% of 38 372 FA word candidates were identified as unnecessary and deleted automatically when the presented method was used. Furthermore, precision and F... |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |