A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data
Autor: | Tarcísio Lucas, Teresa B. Ludermir, Renato Vimieiro, Túlio C. P. B. Silva |
---|---|
Rok vydání: | 2017 |
Předmět: |
Clustering high-dimensional data
business.industry Computer science Population size Crossover Evolutionary algorithm Context (language use) 02 engineering and technology Machine learning computer.software_genre Data set Discriminative model 020204 information systems Mutation (genetic algorithm) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing The Internet Artificial intelligence Data mining Heuristics business computer Software |
Zdroj: | Applied Soft Computing. 59:487-499 |
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2017.05.048 |
Popis: | This paper presents an evolutionary algorithm for Discriminative Pattern (DP) mining that focuses on high dimensional data sets. DPs aims to identify the sets of characteristics that better differentiate a target group from the others (e.g. successful vs. unsuccessful medical treatments). It becomes more natural to extract information from high dimensionality data sets with the increase in the volume of data stored in the world (30 GB/s only in the Internet). There are several evolutionary approaches for DP mining, but none focusing on high-dimensional data. We propose an evolutionary approach attributing features that reduce the cost of memory and processing in the context of high-dimensional data. The new algorithm thus seeks the best (top- k ) patterns and hides from the user many common parameters in other evolutionary heuristics such as population size, mutation and crossover rates, and the number of evaluations. We carried out experiments with real-world high-dimensional and traditional low dimensional data. The results showed that the proposed algorithm was superior to other approaches of the literature in high-dimensional data sets and competitive in the traditional data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |