VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IMPLEMENTED ON DIFFERENT HADOOP PLATFORMS

Autor: Purvi Parmar, MaryEtta Morris, John R. Talburt and Huzaifa F. Syed
Rok vydání: 2020
DOI: 10.5281/zenodo.4030173
Popis: This paper describes the outcome of an attempt to implement the same transitive closure (TC) algorithm for Apache MapReduce running on different Apache Hadoop distributions. Apache MapReduce is a software framework used with Apache Hadoop, which has become the de facto standard platform for processing and storing large amounts of data in a distributed computing environment. The research presented here focuses on the variations observed among the results of an efficient iterative transitive closure algorithm when run against different distributed environments. The results from these comparisons were validated against the benchmark results from OYSTER, an open source Entity Resolution system. The experiment results highlighted the inconsistencies that can occur when using the same codebase with different implementations of Map Reduce. KEYWORDS Entity Resolution; Hadoop; MapReduce; Transitive Closure; HDFS; Cloudera; Talend
Databáze: OpenAIRE