Unsupervised probabilistic feature selection using ant colony optimization

Autor: Behrouz Zamani Dadaneh, Hossein Yeganeh Markid, Ali Zakerolhosseini
Rok vydání: 2016
Předmět:
Zdroj: Expert Systems with Applications. 53:27-42
ISSN: 0957-4174
DOI: 10.1016/j.eswa.2016.01.021
Popis: We proposed an unsupervised method to remove redundant and irrelevant features.The algorithm needs no learning algorithms and class label to select features.Similarity between features will be considered in computation of feature relevance. Feature selection (FS) is one of the most important fields in pattern recognition, which aims to pick a subset of relevant and informative features from an original feature set. There are two kinds of FS algorithms depending on the presence of information about dataset class labels: supervised and unsupervised algorithms. Supervised approaches utilize class labels of dataset in the process of feature selection. On the other hand, unsupervised algorithms act in the absence of class labels, which makes their process more difficult. In this paper, we propose unsupervised probabilistic feature selection using ant colony optimization (UPFS). The algorithm looks for the optimal feature subset in an iterative process. In this algorithm, we utilize inter-feature information which shows the similarity between the features that leads the algorithm to decreased redundancy in the final set. In each step of the ACO algorithm, to select the next potential feature, we calculate the amount of redundancy between current feature and all those which have been selected thus far. In addition, we utilize a matrix to hold ant related pheromone which shows the rate of the co-presence of every pair of features in solutions. Afterwards, features are ranked based on a probability function extracted from the matrix; then, their m-top is returned as the final solution. We compare the performance of UPFS with 15 well-known supervised and unsupervised feature selection methods using different classifiers (support vector machine, naive Bayes, and k-nearest neighbor) on 10 well-known datasets. The experimental results show the efficiency of the proposed method compared to the previous related methods.
Databáze: OpenAIRE