Mixing numerical and categorical data in a Self-Organizing Map by means of frequency neurons
Autor: | Carmelo del Coso, Bernardino Arcay, José M. Rodríguez-Pedreira, Carlos Dafonte, Francisco J. Novoa, Diego Fustes |
---|---|
Rok vydání: | 2015 |
Předmět: |
Self-organizing map
Computer science business.industry Big data computer.software_genre Machine learning ComputingMethodologies_PATTERNRECOGNITION Transformation (function) Pattern recognition (psychology) Benchmark (computing) Data mining Artificial intelligence business computer Categorical variable Software |
Zdroj: | Applied Soft Computing. 36:246-254 |
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2015.06.058 |
Popis: | Graphical abstractDisplay Omitted HighlightsSelf-Organizing Maps (SOMs) are powerful tools with many applications. Nevertheless, they cannot deal directly with categorical variables.In order to present categorical variables to SOMs, they are usually transformed by binarization. This increases dramatically the dataset dimensionality.NCSOM has been presented in order to cope with categorical or mixed data. However, it presents some drawbacks: categorical and numerical variables are not equally balanced and the method is not convergent.A novel SOM variant, called FMSOM, is presented which is able to deal with numerical and categorical variables, giving the same weight to them and ensuring convergence. A scalable implementation of the method is fully described.FMSOM is applied to a benchmark of well known datasets, composed of categorical and mixed data. The results show the potential of the method to analyze this kind of datasets. Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |