Multi-Step Iterative Algorithm for Feature Selection on Dynamic Documents
Autor: | Prafulla Bafna, Shailaja Shirwaikar, Dhanya Pramod |
---|---|
Rok vydání: | 2016 |
Předmět: | |
Zdroj: | International Journal of Information Retrieval Research. 6:24-40 |
ISSN: | 2155-6385 2155-6377 |
DOI: | 10.4018/ijirr.2016040102 |
Popis: | The authors propose clustering based multistep iterative algorithm. The important step is where terms are grouped by synonyms. It takes advantage of semantic relativity measure between the terms. Term frequency is computed of the group of synonyms by considering the relativity measure of the terms appearing in the document from the parent term in the group. This increases the importance of terms which though individually appear less frequently but together show their strong presence. The authors tried experiments on different real and artificial datasets such as NEWS 20, Reuters, emails, research papers on different topics. Resulted entropy shows that their algorithm gives improved result on certain set of documents which are well-articulated, such as research papers. The results are marginal on documents where the message is emphasized by repetitions of terms specifically the documents that are rapidly generated such as emails. The authors also observed that newly arrived documents get appropriately mapped based on proximity to the semantic group. |
Databáze: | OpenAIRE |
Externí odkaz: |