Trends of Evolutionary Machine Learning to Address Big Data Mining
Autor: | Ghita Benjelloun, Sana Ben Hamida, Hmida Hmida |
---|---|
Přispěvatelé: | Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision (LAMSADE), Université Paris Dauphine-PSL, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Université Paris sciences et lettres (PSL), Université de Tunis El Manar (UTM), Inès Saad, Camille Rosenthal-Sabroux, Faiez Gargouri, Pierre-Emmanuel Arduin |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Active learning (machine learning)
Computer science Big data Genetic Programming Active Learning Genetic programming 02 engineering and technology Machine learning computer.software_genre 01 natural sciences Data Sampling Machine Learning [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Information system 010303 astronomy & astrophysics Big data mining business.industry Pulsar Detection Evolutionary data mining Sampling (statistics) Big Data Mining Active data Horizontal Parallelization Higgs Boson Classification 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | Information and Knowledge Systems. Digital Technologies, Artificial Intelligence and Decision Making Inès Saad; Camille Rosenthal-Sabroux; Faiez Gargouri; Pierre-Emmanuel Arduin. Information and Knowledge Systems. Digital Technologies, Artificial Intelligence and Decision Making, 425, Springer International Publishing, pp.85-99, 2021, Lecture Notes in Business Information Processing, 978-3-030-85976-3. ⟨10.1007/978-3-030-85977-0_7⟩ Lecture Notes in Business Information Processing ISBN: 9783030859763 ICIKS |
DOI: | 10.1007/978-3-030-85977-0_7⟩ |
Popis: | International audience; Improving decisions by better mining the available data in an Information System is a common goal in many decision making environments. However, the complexity and the large size of the collected data in modern systems make this goal a challenge for mining methods. Evolutionary Data Mining Algorithms (EDMA), such as Genetic Programming (GP), are powerful meta-heuristics with an empirically proven efficiency on complex machine learning problems. They are expected to be applied to real-world big data tasks and applications in our daily life. Thus, they need, as all machine learning techniques, to be scaled to Big Data bases. This paper review some solutions that could be applied to help EDMA to deal with Big Data challenges. Two solutions are then selected and explained. The first one is based on the algorithmic manipulation involving the introduction of the active learning paradigm thanks to the active data sampling. The second is based on the processing manipulation involving horizontal scaling thanks to the processing distribution over networked nodes. This work explains how each solution is introduced to GP. As preliminary experiences, the extended GP is applied to solve two complex machine learning problem: the Higgs Boson classification problem and the Pulsar detection problem. Experimental results are then discussed and compared to value the efficiency of each solution. |
Databáze: | OpenAIRE |
Externí odkaz: |