A graph-based big data optimization approach using hidden Markov model and constraint satisfaction problem

Autor:	Abdelkrim Bekkhoucha, Samir Anter, Imad Sassi
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Optimization Graphical modeling Computer engineering. Computer hardware Information Systems and Management Computer Networks and Communications Computer science Big data Metaheuristics Information technology Machine learning computer.software_genre Big data analytics TK7885-7895 Hidden Markov model Metaheuristic Constraint satisfaction problem business.industry QA75.5-76.95 Solver T58.5-58.64 Mean absolute percentage error Hardware and Architecture Electronic computers. Computer science Time series forecasting Benchmark (computing) Graph (abstract data type) Artificial intelligence business computer Information Systems
Zdroj:	Journal of Big Data, Vol 8, Iss 1, Pp 1-29 (2021)
ISSN:	2196-1115
Popis:	To address the challenges of big data analytics, several works have focused on big data optimization using metaheuristics. The constraint satisfaction problem (CSP) is a fundamental concept of metaheuristics that has shown great efficiency in several fields. Hidden Markov models (HMMs) are powerful machine learning algorithms that are applied especially frequently in time series analysis. However, one issue in forecasting time series using HMMs is how to reduce the search space (state and observation space). To address this issue, we propose a graph-based big data optimization approach using a CSP to enhance the results of learning and prediction tasks of HMMs. This approach takes full advantage of both HMMs, with the richness of their algorithms, and CSPs, with their many powerful and efficient solver algorithms. To verify the validity of the model, the proposed approach is evaluated on real-world data using the mean absolute percentage error (MAPE) and other metrics as measures of the prediction accuracy. The conducted experiments show that the proposed model outperforms the conventional model. It reduces the MAPE by 0.71% and offers a particularly good trade-off between computational costs and the quality of results for large datasets. It is also competitive with benchmark models in terms of the running time and prediction accuracy. Further comparisons substantiate these experimental findings.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5f30d633b63da9916c07f4ea5210f05a https://doaj.org/article/e4f6212e22b84053849a8b684392fe07 Zobrazit plný text záznamu Full text from SpringerLink