A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data
Autor: | Mingmin Chi, Jinsheng Shen |
---|---|
Rok vydání: | 2018 |
Předmět: |
Topic model
Computer science business.industry Probabilistic logic 02 engineering and technology Construct (python library) computer.software_genre Latent Dirichlet allocation Computer Science Applications symbols.namesake Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Feature (machine learning) symbols Business Management and Accounting (miscellaneous) Probability distribution 020201 artificial intelligence & image processing Point (geometry) The Internet Data mining Statistics Probability and Uncertainty business computer |
Zdroj: | Annals of Data Science. 5:9-19 |
ISSN: | 2198-5812 2198-5804 |
Popis: | With fast development of Internet technologies and sensor techniques, it is much easier to acquire data from different sources in different dates and times. However, how to compute the correlation of those heterogeneous data is a big challenge for data mining and information retrieval. Here, data feature from one source is called as a view, and the multiview features denote the same data point. In the paper, hidden correlation of two-view features is proposed to construct a Heterogeneous (multiview) Topic Model (HTM). In particular, probabilistic topic model is utilized for different views as usually, generative models provide much richer features when handling high-dimensional data such as texts. Nevertheless, it is necessary to know the form of probability distribution for most existent probabilistic topic models, such as latent Dirichlet allocation. By avoiding the limitation of probabilistic topic model, the HTM is reduced to solving a non-negative matrix tri-factorization problem with certain constraints such that the proposed approach can be used in terms of an arbitrary model. |
Databáze: | OpenAIRE |
Externí odkaz: |