Parallelization of Algorithms for Mining Data from Distributed Sources

Autor:	Maria Efimova, Ivan Kholod, Sergei Gorlatch, Andrey Shorov
Rok vydání:	2019
Předmět:	020203 distributed computing Naive Bayes classifier Data access Computer science Distributed algorithm Computation 0202 electrical engineering electronic engineering information engineering Parallel algorithm Context (language use) 02 engineering and technology Representation (mathematics) Algorithm 020202 computer hardware & architecture
Zdroj:	Lecture Notes in Computer Science ISBN: 9783030256357 PaCT
DOI:	10.1007/978-3-030-25636-4_23
Popis:	We suggest an approach to optimize data mining in modern applications that work on distributed data. We formally transform a high-level functional representation of a data-mining algorithm into a parallel implementation that performs as much as possible computations locally at the data sources, rather than accumulating all data for processing at a central location as in the traditional MapReduce approach. Our approach avoids the main disadvantages of the state-of-the-art MapReduce frameworks in the context of distributed data: increased run time, high network traffic, and an unauthorized access to data. We use the popular data-mining algorithm – Naive Bayes – for illustrating our approach and evaluating it experimentally. Our experiments confirm that the implementation of Naive Bayes developed by using our approach significantly outperforms the traditional MapReduce-based implementation regarding the run time and the network traffic.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f29d297732445967d6106efb3631a74f https://doi.org/10.1007/978-3-030-25636-4_23 Zobrazit plný text záznamu