Short text classification using feature enrichment from credible texts

Autor: Alsmadi, Issa M., Gan, Keng Hoon
Zdroj: International Journal of Web Engineering and Technology; 2020, Vol. 15 Issue: 1 p59-80, 22p
Abstrakt: Classifying Tweet's contents can become a useful feature for other application tasks. However, such classification can be quite challenging due to the short length and sparsity of tweet contents. Although individual tweets have limited length, their contents delve into different topics. Therefore, due to such diverse contents, achieving good coverage of content features remains a challenge. We adopt the expansion of keywords technique in this research and study the enrichment of tweet contents using text from credible sources, such as news sites. For evaluation, we conduct experiments on two Twitter datasets using four standard classifiers. The proposed approach has enhanced the performance of the classification task, with improvements in accuracy ranging from +0.05% to +3.54% for both datasets. Experimental results positively demonstrate that the proposed feature enrichment method can overcome the sparseness limitation of short text with improved classification performances when running on various classifiers.
Databáze: Supplemental Index