Simultaneous Feature Selection and Cluster Analysis Using Genetic Algorithm

Autor: Shreya Chaudhuri, Sujata Ghatak, Sunanda Das, Asit Kumar Das
Rok vydání: 2016
Předmět:
Zdroj: ICIT
DOI: 10.1109/icit.2016.064
Popis: Cluster analysis being one of the important techniques of data mining applied in several fields such as bioinformatics, social networks, computer vision, and so on. It is an unsupervised learning technique for exploring the structure of the data without class label. Many clustering algorithms have been proposed to analyze high volume of data, but very few of them evaluate the quality of the clusters due to irrelevant and inconsistent features present in the dataset. So, feature selection is an important pre-processing step in data analysis mainly for high dimensional dataset. In the paper, we select optimal subset of features and perform clusters analysis simultaneously using genetic algorithm. Basically, genetic algorithm is used to select the optimal subset of features which automatically finds optimal number of clusters sat the end of the process. Optimality of the clusters is measured by calculating various cluster validation indices. The overall performance of the method is investigated on popular UCI datasets and the experimental results are compared with Fuzzy C-Means algorithm to demonstrate effectiveness of the proposed method.
Databáze: OpenAIRE