Integration and Optimization Technologies for Multiple Big Data Processing Platforms

Autor: Yun-Che Tsai, 蔡允哲
Rok vydání: 2014
Druh dokumentu: 學位論文 ; thesis
Popis: 102
The objective of this study is to realize a multiple big data processing platform with high performance and high availability. The integration of Apache Hive, Cloudera Impala, and BDAS Shark make the platform support SQL query in big data environment. In addition, users can access a single interface and select the best performance of big data warehouse platform automatically by the optimizer proposed in this research. Distributed memory storage system Memcached along with distributed file system Apache Hadoop HDFS is employed for caching query results. Thereafter, if user gives the same SQL query command, user is able to get the same result rapidly from the high-performance cache system so as to avoid a longer retrieval time when suffering the repeated searches in big data warehouse platform. The proposed approach definitely improves the overall performance significantly, and especially the application of the high repeatable SQL commands with multi-user mode makes it possible to reduce the time for query/response dramatically.
Databáze: Networked Digital Library of Theses & Dissertations