Interactive System Using LDA for Exploratory Visualization to Extract Data Association in a Data Lake
Autor: | Takaki Yamada, Yuko Kato, Tomoe Tomiyama, Yuki Maekawa |
---|---|
Rok vydání: | 2018 |
Předmět: |
Jaccard index
Computer science business.industry 020208 electrical & electronic engineering Big data 02 engineering and technology Function (mathematics) Complex network computer.software_genre Latent Dirichlet allocation Visualization symbols.namesake Categorization 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing Noise (video) Data mining business computer |
Zdroj: | SMC |
DOI: | 10.1109/smc.2018.00040 |
Popis: | An interactive system previously developed for exploratory visualization of data associations in a data lake using a self-organizing structure of schemas has been improved by incorporating a machine learning function for latent Dirichlet allocation (LDA) and a categorization function. A topic (i.e., a list of data values and corresponding appearance probabilities) estimated by LDA can be used as a recommendation that indicates latent data association of co-occurrences in a complex network structure. Results of experiments using random data demonstrated that a latent data association with a signal strength of 0.20 (Jaccard coefficient) can be detected over noise with a strength of up to 0.24. The detected recommendation potentially can help the user to create a hypothesis of a useful pattern in big data. |
Databáze: | OpenAIRE |
Externí odkaz: |