A Novel Short Text Clustering Model Based on Grey System Theory

Autor: Mehmet Erkan Yuksel, Hüseyin Fidan
Rok vydání: 2019
Předmět:
Zdroj: Arabian Journal for Science and Engineering. 45:2865-2882
ISSN: 2191-4281
2193-567X
DOI: 10.1007/s13369-019-04191-0
Popis: Short text clustering has great challenges due to the structural reasons, especially when applied to small datasets. Limited number of words leads to a poor-quality feature vector, low clustering accuracy, and failure of analysis. Although some approaches have been observed in the related literature, there is still no agreement on an efficient solution. On the other hand, the Grey system theory, which gives better results in numerical analyses with insufficient data, has not yet been applied to short text clustering. The purpose of our study is to develop a short text clustering model based on Grey system theory applicable to small datasets. In order to measure the efficiency of our method, book reviews labeled as negative or positive were obtained from Amazon.com dataset collections, and small datasets have been created. The Grey relational clustering as well as hierarchical and partitional algorithms has been applied to the small datasets separately. According to the results, our model has better accuracy values than the other algorithms in clustering of small datasets containing short text. Consequently, we demonstrated that the Grey relational clustering should be applied to short text clustering for much better results.
Databáze: OpenAIRE