Autor: |
Jia, Quanye, Liu, Rui, Zhang, He, You, Lu |
Zdroj: |
IOP Conference Series: Materials Science and Engineering; October 2018, Vol. 435 Issue: 1 p012055-012055, 1p |
Abstrakt: |
The topic distribution in text generally has long tail effect, but few people do research on how to dig out long tail topics from matrix factorization. So we propose a method, this is Non-negative Matrix Factorization with Long-tail Constraint (LTNMF). LTNMF adds the soft orthogonal constraints to the feature matrix to ensure the independence of the topics on the basis of the non-negative matrix factorization. The sparse constraints and long tail constraints are added to the topic document matrix to enhance the robustness of the model and the characterization of the long tail features of the topic distribution. The combination of soft orthogonal constraints, sparse constraints and long tail constraints enables the model to extract the long tail topic information in the data and ensure the quality of the topic. We use Sougou and 20newsgroup datasets to experiment, and the results show that LTMNF can dig more topic words and improve the accuracy and the standard mutual information of clustering in text classification. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|