MapReduce Programs Simplification using a Query Criteria API

Autor: Boulchahoub Hassan, Amina Rachiq, Benabbou Fouzia, Khalil Namir, Labriji Elhoussin
Rok vydání: 2018
Předmět:
Zdroj: International Journal of Advanced Computer Science and Applications. 9
ISSN: 2156-5570
2158-107X
DOI: 10.14569/ijacsa.2018.090607
Popis: A Hadoop HDFS is an organized and distributed collection of files. It is created to store a huge part of data and then retrieve it and analyze it efficiently in a less amount of time. To retrieve and analyze data from the Hadoop HDFS, MapReduce Jobs must be created directly using some programming languages like Java or indirectly using some high level languages like HiveQL and PigLatin. Everyone knows that creating MapReduce programs using programming languages is a difficult task that requires a remarkable effort for their creation and also for their maintenance. Writing MapReduce code by hand needs a lot of time, introduce bugs, harm readability, and impede optimizations. Profiles working in the field of big data always try to avoid hard and long programs in their work. They are always looking for much simpler alternatives like graphical interfaces or reduced scripts like PIG Latin or even SQL queries. This article proposes to use a MapReduce Query API inspired from Hibernate Criteria to simplify the code of MapReduce programs. This API proposes a set of predefined methods for making restrictions, projections, logical conditions and so on. An implementation of the Word Count example using the Query Criteria API is illustrated in this paper.
Databáze: OpenAIRE