Cross-MapReduce: Data transfer reduction in geo-distributed MapReduce
Autor: | Abdorreza Savadi, Mahmoud Naghibzadeh, Adel Nadjaran Toosi, Seyed Saeed Mirpour Marzuni |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer Networks and Communications
business.industry Computer science Distributed computing Big data Testbed 020206 networking & telecommunications 02 engineering and technology Hardware and Architecture 0202 electrical engineering electronic engineering information engineering Graph (abstract data type) 020201 artificial intelligence & image processing Data center The Internet business Software Data transmission |
Zdroj: | Future Generation Computer Systems. 115:188-200 |
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2020.09.009 |
Popis: | The MapReduce model is widely used to store and process big data in a distributed manner. MapReduce was originally developed for a single tightly coupled cluster of computers. Approaches such as Hierarchical and Geo-Hadoop are designed to address geo-distributed MapReduce processing. However, these methods still suffer from high inter-cluster data transfer over the Internet, which is prohibitive for processing today’s globally big data. In line with our thinking that there is no need to transfer the entire intermediate results to a single global reducer, we propose Cross-MapReduce, a framework for geo-distributed MapReduce processing. Before any massive data transfer, our proposed method finds a set of best global reducers to minimize transferred data volumes. We propose a graph called Global Reduction Graph (GRG) to determine the number and the locations of the global reducers. We conducted extensive experimental evaluations using a real testbed to demonstrate the effectiveness of Cross-MapReduce. The experimental results show that Cross-MapReduce significantly outperforms the Hierarchical and Geo-Hadoop approaches and reduces the amount of data transfer over the Internet by 40%. |
Databáze: | OpenAIRE |
Externí odkaz: |