Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering

Autor: Madhulika Yarlagadda, K. Gangadhara Rao, A. Srikrishna
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Journal of King Saud University: Computer and Information Sciences, Vol 34, Iss 4, Pp 1098-1109 (2022)
Druh dokumentu: article
ISSN: 1319-1578
DOI: 10.1016/j.jksuci.2019.09.002
Popis: Document clustering has recently been paid great attention in retrieval, navigation, and summarization of huge volumes of documents. With a better document clustering approach, computers can organize a document corpus automatically to a meaningful cluster for enabling efficient navigation, and browsing of the corpus. Document navigation and browsing is a valuable complement to the deficiencies of information retrieval technologies. This paper introduces Modsup-based frequent itemset and Rider Optimization-based Moth Search Algorithm (Rn-MSA) for clustering the documents. At first, the input documents are given to the pre-processing step, and then, the extraction is carried out based on TF-IDF and Wordnet features. Once the extraction is done, the feature selection is carried out based on frequent itemset for the establishment of feature knowledge. At last, the document clustering is done using the proposed Rn-MSA, which is designed by combining Rider Optimization Algorithm (ROA), and the Moth Search Algorithm (MSA). The performance of the document clustering based on proposed Modsup + Rn-MSA is evaluated in terms of precision, recall, F-Measure, and accuracy. The developed document clustering method achieves the maximal precision of 95.90%, maximal recall of 96.41%, maximal F-Measure of 96.41%, and the maximal accuracy of 95.12% that indicates its superiority.
Databáze: Directory of Open Access Journals