Feature Selection for Regression Problems Based on the Morisita Estimator of Intrinsic Dimension
Autor: | Jean Golay, Mikhail Kanevski, Michael N. Leuenberger |
---|---|
Rok vydání: | 2016 |
Předmět: |
FOS: Computer and information sciences
Computer science Estimator Feature selection Machine Learning (stat.ML) 02 engineering and technology Filter (signal processing) computer.software_genre Machine Learning (cs.LG) Computer Science - Learning Artificial Intelligence Sample size determination Statistics - Machine Learning 020204 information systems Signal Processing 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Noise (video) Data mining Intrinsic dimension Representation (mathematics) computer Software Selection (genetic algorithm) |
DOI: | 10.48550/arxiv.1602.00216 |
Popis: | Data acquisition, storage and management have been improved, while the key factors of many phenomena are not well known. Consequently, irrelevant and redundant features artificially increase the size of datasets, which complicates learning tasks, such as regression. To address this problem, feature selection methods have been proposed. This paper introduces a new supervised filter based on the Morisita estimator of intrinsic dimension. It can identify relevant features and distinguish between redundant and irrelevant information. Besides, it offers a clear graphical representation of the results, and it can be easily implemented in different programming languages. Comprehensive numerical experiments are conducted using simulated datasets characterized by different levels of complexity, sample size and noise. The suggested algorithm is also successfully tested on a selection of real world applications and compared with RReliefF using extreme learning machine. In addition, a new measure of feature relevance is presented and discussed. |
Databáze: | OpenAIRE |
Externí odkaz: |