Constructing Personalized Document Clustering Based on User-Defined Labels
Autor: | YANG JIA FONG, 楊佳鳳 |
---|---|
Rok vydání: | 2014 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 102 With the rapid development of the Internet, modern people increasingly rely on the network to find all kinds of information. Internet has become an important source of access to information. How to efficiently manage a large number of documents becomes an important issue. Traditional methods for managing files are manual sorting, but manual sorting is time-consuming, physical consumption and different standards. We need use technique to achieve the purpose of effective document management. Document cluster is one way to help manage files, but there are two drawback in tradition document cluster algorithms. First, user can not understand the meanings for each cluster. Second, tradition document cluster can not give appropriate clustering results for different users. This study proposes algorithms LHC (Label Hierarchical Cluster) algorithms to improve tradition document cluster algorithms. LHC not only achieve personalized document clustering, but also give appropriate labels for individual clusters. In the end, LHC generate hierarchical structure. In the experimental part, the subjects are students who study in CCU information management. There are 32 subjects. The data are literature of the student paper, and there are 10 to 25 documents in each data set. Experimental results show that the LHC algorithm significantly better than the traditional well-known algorithms. Cluster Recall representing less than expected. Because the LHC parameter estimation problems, cluster label may have led to fewer clusters. Furthermore document clustering hierarchical show most of the data sets conform the concept of user-level, but there are small part not conform. But even so, the hierarchical structure is still meaningful, users can be regarded as a leaf node multi-label classification of documents. Overall, LHC algorithm can really improve the tradition clustering algorithms and achieve personal document clustering. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |