Term Weighting Schemes for Slovak Text Document Clustering
Autor: | ZLACKÝ Daniel, STAŠ Ján, JUHÁR Jozef, CIŽMÁR Anton |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: | |
Zdroj: | Journal of Electrical and Electronics Engineering, Vol 6, Iss 1, Pp 163-166 (2013) |
Druh dokumentu: | article |
ISSN: | 1844-6035 2067-2128 |
Popis: | Text representation is the task of transforming the textual data into a multidimensional space with corresponding weights for every word. Wehave tested several widely used term weighting methods on manually created database from Slovak Wikipedia articles. The created vector space models were used as an input in unsupervised clustering algorithms, which cluster text documents based on these created models. We have tested nine different weighting schemes withK-mean clustering algorithm. The best results were obtained by TF-RIDF weighting scheme. However, the next experiments with different clustering techniques have not confirmed previous results. |
Databáze: | Directory of Open Access Journals |
Externí odkaz: |