Document cluster detection on latent projections
Autor: | Hugo Hidalgo-Silva, Dora Alvarez-Medina |
---|---|
Rok vydání: | 2009 |
Předmět: |
Computer science
business.industry Computer Science::Information Retrieval Matrix representation Probabilistic logic Pattern recognition Statistical model computer.software_genre Data modeling Visualization ComputingMethodologies_PATTERNRECOGNITION Data visualization Multinomial distribution Data mining Artificial intelligence business computer Event (probability theory) |
Zdroj: | ICDIM |
DOI: | 10.1109/icdim.2009.5356765 |
Popis: | Probabilistic text data modeling is usually considered with Bernoulli or multinomial event models. The main problem of text mining is the large amount of zero account in the matrix representation. Recently a document visualization technique incorporating the Zero Inflated Poisson model in the Generative Topographic Mapping algorithm has been proposed. This probabilistic model can be applied as a text document visualization tool. In this work, an algorithm for automatically extracting the clusters in the visualization results is presented. The combination of visualization-cluster extraction algorithms allows to obtain and evaluate document collections. Several results are presented for 20-Newsgroups and Reuters data. |
Databáze: | OpenAIRE |
Externí odkaz: |