Towards the Overcome of Performance Pitfalls in Data Stream Mining Tools

Autor: Marco Antonio Alves Zanata, Jean Paul Barddal, Lucca Portes Cavalheiro
Rok vydání: 2021
Předmět:
Zdroj: IJCNN
DOI: 10.1109/ijcnn52387.2021.9533375
Popis: Data stream mining is an essential task in today's scientific community. It allows machine learning models to be updated over time as new data becomes available. Three pillars should be accounted for when selecting an appropriate algorithm for data stream mining: accuracy, processing time, and memory consumption. To develop and assess machine learning models in streaming scenarios, different tools have been developed, where the Massive Online Analysis, written in Java, and scikit-multiflow, written in Python, are in the spotlight. Despite the ease of use of both tools, neither are focused on performance, which puts in jeopardy the usage of the computational resources. In this paper, we show that with the right tools, Python libraries reach performance comparable to C/C++. More specifically, we show how optimized implementations in scikit-multiflow using low-level languages, i.e., C++, C++ with Intel Intrinsics, and Rust; with bindings to Python vastly overcome existing tools in computational resources usage while keeping predictive performance intact.
Databáze: OpenAIRE