Effective and Robust Query-Based Stemming
Autor: | Jiaul H. Paik, Dipasree Pal, Swapan K. Parui, Stephen Robertson |
---|---|
Rok vydání: | 2013 |
Předmět: |
Information retrieval
business.industry Computer science InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL computer.software_genre General Business Management and Accounting Computer Science Applications Query expansion Robustness (computer science) Fully automatic Vocabulary mismatch Thematic coherence Artificial intelligence Stemming Suffix business computer Natural language processing Information Systems |
Zdroj: | ACM Transactions on Information Systems. 31:1-29 |
ISSN: | 1558-2868 1046-8188 |
Popis: | Stemming is a widely used technique in information retrieval systems to address the vocabulary mismatch problem arising out of morphological phenomena. The major shortcoming of the commonly used stemmers is that they accept the morphological variants of the query words without considering their thematic coherence with the given query, which leads to poor performance. Moreover, for many queries, such approaches also produce retrieval performance that is poorer than no stemming, thereby degrading the robustness. The main goal of this article is to present corpus-based fully automatic stemming algorithms which address these issues. A set of experiments on six TREC collections and three other non-English collections containing news and web documents shows that the proposed query-based stemming algorithms consistently and significantly outperform four state of the art strong stemmers of completely varying principles. Our experiments also confirm that the robustness of the proposed query-based stemming algorithms are remarkably better than the existing strong baselines. |
Databáze: | OpenAIRE |
Externí odkaz: |