Document Classification Using Enhanced Grid Based Clustering Algorithm
Autor: | Mohamed Waleed Fakhr, Mohamed Ahmed Rashad, Hesham El-Deeb |
---|---|
Rok vydání: | 2014 |
Předmět: |
business.industry
Computer science Computer Science::Information Retrieval Document classification Correlation clustering k-means clustering Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) Pattern recognition Document clustering computer.software_genre Data stream clustering CURE data clustering algorithm Canopy clustering algorithm Artificial intelligence business Cluster analysis computer |
Zdroj: | Lecture Notes in Electrical Engineering ISBN: 9783319067636 |
DOI: | 10.1007/978-3-319-06764-3_27 |
Popis: | Automated document clustering is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text clustering aims to automatically assign the text to a predefined cluster based on linguistic features. This research proposes an enhanced grid based clustering algorithm. The main purpose of this algorithm is to divide the data space into clusters with arbitrary shape. These clusters are considered as dense regions of points in the data space that are separated by regions of low density representing noise. Also it deals with making clustering the data set with multi-densities and assigning noise and outliers to the closest category. This will reduce the time complexity. Unclassified documents are preprocessed by removing stops words and extracting word root used to reduce the dimensionality of feature vectors of documents. Each document is then represented as a vector of words and their frequencies. The accuracy is presented according to time consumption and the percentage of successfully clustered instances. The results of the experiments that were carried out on an in-house collected Arabic text have proven its effectiveness of the enhanced clustering algorithm with average accuracy 89 %. |
Databáze: | OpenAIRE |
Externí odkaz: |