Interactive System Using LDA for Exploratory Visualization to Extract Data Association in a Data Lake

Autor: Takaki Yamada, Yuko Kato, Tomoe Tomiyama, Yuki Maekawa
Rok vydání: 2018
Předmět:
Zdroj: SMC
DOI: 10.1109/smc.2018.00040
Popis: An interactive system previously developed for exploratory visualization of data associations in a data lake using a self-organizing structure of schemas has been improved by incorporating a machine learning function for latent Dirichlet allocation (LDA) and a categorization function. A topic (i.e., a list of data values and corresponding appearance probabilities) estimated by LDA can be used as a recommendation that indicates latent data association of co-occurrences in a complex network structure. Results of experiments using random data demonstrated that a latent data association with a signal strength of 0.20 (Jaccard coefficient) can be detected over noise with a strength of up to 0.24. The detected recommendation potentially can help the user to create a hypothesis of a useful pattern in big data.
Databáze: OpenAIRE