Parallelization of Algorithms for Mining Data from Distributed Sources
Autor: | Maria Efimova, Ivan Kholod, Sergei Gorlatch, Andrey Shorov |
---|---|
Rok vydání: | 2019 |
Předmět: |
020203 distributed computing
Naive Bayes classifier Data access Computer science Distributed algorithm Computation 0202 electrical engineering electronic engineering information engineering Parallel algorithm Context (language use) 02 engineering and technology Representation (mathematics) Algorithm 020202 computer hardware & architecture |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783030256357 PaCT |
DOI: | 10.1007/978-3-030-25636-4_23 |
Popis: | We suggest an approach to optimize data mining in modern applications that work on distributed data. We formally transform a high-level functional representation of a data-mining algorithm into a parallel implementation that performs as much as possible computations locally at the data sources, rather than accumulating all data for processing at a central location as in the traditional MapReduce approach. Our approach avoids the main disadvantages of the state-of-the-art MapReduce frameworks in the context of distributed data: increased run time, high network traffic, and an unauthorized access to data. We use the popular data-mining algorithm – Naive Bayes – for illustrating our approach and evaluating it experimentally. Our experiments confirm that the implementation of Naive Bayes developed by using our approach significantly outperforms the traditional MapReduce-based implementation regarding the run time and the network traffic. |
Databáze: | OpenAIRE |
Externí odkaz: |