Comparative Performance Evaluation Using Hadoop Ecosystem –PIG and HIVE Through Rendering of Duplicates
Autor: | C. S. Satsangi, Pragya Pandey |
---|---|
Rok vydání: | 2018 |
Předmět: |
Database
business.industry Computer science Big data Cloud computing Unstructured data 02 engineering and technology computer.software_genre Data warehouse Rendering (computer graphics) 020204 information systems Streaming data 0202 electrical engineering electronic engineering information engineering Batch processing 020201 artificial intelligence & image processing business Internet of Things computer |
Zdroj: | International Conference on Advanced Computing Networking and Informatics ISBN: 9789811326721 |
Popis: | Traditionally, for analysis and decision making, preprocessed data have been stored on data warehouse and various operations are performed on those stored data. With the rapid growth in cloud applications and IoT-based systems, data get generated with high velocity and increased volume. Thus, big data, which get generated by variety of structured and unstructured data sources, are heterogeneous. There is a need to integrate variety of data and analyze the large-scale data. Hadoop provides a solution for such processing needs. Inherently, it is designed for high-throughput batch processing jobs and for handling complex queries for streaming data. This paper presents the MapReduce model of Hadoop framework with two analytical ecosystems PIG and HIVE. Here, we also present performance evaluation for each category like processing time for some queries executed on Pig and Hive while combining two healthcare datasets, gathered from different data sources. Comparative analysis has also been done and is presented in this paper. |
Databáze: | OpenAIRE |
Externí odkaz: |