Large Datasets Visualization with Neural Network Using Clustered Training Data

Autor: Viktor Medvedev, Gintautas Dzemyda, Sergėjus Ivanikovas
Rok vydání: 2008
Předmět:
Zdroj: Advances in Databases and Information Systems ISBN: 9783540857129
ADBIS
DOI: 10.1007/978-3-540-85713-6_11
Popis: This paper presents the visualization of large datasets with SAMANN algorithm using clustering methods for initial dataset reduction for the network training. The visualization of multidimensional data is highly important in data mining because recent applications produce large amount of data that need specific means for the knowledge discovery. One of the ways to visualize multidimensional dataset is to project it onto a plane. This paper analyzes the visualization of multidimensional data using feed-forward neural network. We investigate an unsupervised backpropagation algorithm to train a multilayer feed-forward neural network (SAMANN) to perform the Sammon`s nonlinear projection. The SAMANN network offers the generalization ability of projecting new data. Previous investigations showed that it is possible to train SAMANN using only a part of analyzed dataset without the loss of accuracy. It is very important to select proper vector subset for the neural network training. One of the ways to construct relevant training subset is to use clustering. This allows to speed up the visualization of large datasets.
Databáze: OpenAIRE