Towards a Novel Framework for Automatic Big Data Detection

Autor: Hameeza Ahmed, Muhammad Ali Ismail
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: IEEE Access, Vol 8, Pp 186304-186322 (2020)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2020.3030562
Popis: Big data is a ”relative” concept. It is the combination of data, application, and platform properties. Recently, big data specific technologies have emerged, including software frameworks, databases, hardware accelerators, storage technologies, etc. However, the automatic selection of these solutions for big data computations remains a non-trivial task. Presently, the big data tools are selected by analyzing the problem manually, or by using several performance prediction techniques. The manual identification is based on the data properties only, whereas the performance predictors only estimate basic execution metrics without linking them with big data (3Vs) thresholds. Hence, both ways of identification are mostly incorrect, which can lead to inefficient use of 3Vs optimizations, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a novel framework has been proposed for automatic detection of 3Vs (Volume, Velocity, Variety) of big data, using machine learning. The detection is done through static code features, data, and platform properties, leading to relevant tool selection, and code generation, with minimal overheads, lesser programmer interventions, higher usability, and portability. Instead of handling each application with big data specialized solutions, or manually identifying the 3Vs, the framework can automatically detect and link the 3Vs to the relevant optimizations. Several standard applications have been tested using the proposed framework. In the case of volume, the average detection accuracy is up to 97.8% for seen and 95.9% for unseen applications. In the case of velocity, the average detection accuracy is up to 97.3% for seen and 92.6% for unseen applications. There is no margin of error in variety detection, as it has straightforward computations without any predictions. Furthermore, an airline recommendation system case study strengthens the effectiveness of the proposed approach.
Databáze: Directory of Open Access Journals