Robust Incremental Logistic Regression for Detection of Anomaly Using Big Data

Autor: Topon Paul, Ken Ueno
Rok vydání: 2020
Předmět:
Zdroj: ICMLA
DOI: 10.1109/icmla51294.2020.00187
Popis: Nowadays a lot of data are being continuously or incrementally collected at various fields through IoT (Internet of Things) devices and sensors. To extract useful patterns from these huge volumes of data, pattern recognition and machine learning techniques are applied, which build models by extracting patterns from data at once, and the models are not updated until the performances of the models deteriorate significantly. This traditional approach of learning a model using all the data at once may not be feasible in many applications because it may require a huge communication cost and storage to collect the data and take a very long time to build a model, and the onetime built model may not be able to learn the changed patterns automatically over time. To overcome some of the limitations of the traditional approach, we propose a new method called Robust Incremental Logistic Regression (RILR), which learns and updates model parameters as new batches of training data arrive. We show the effectiveness of the proposed method by performing experiments with 10 publicly available data sets and evaluating it in terms of AUC (Area Under the receiver operating characteristic Curve) on test data, robustness, execution time, and storage requirement. Experimental results suggest that our proposed method is able to achieve the desired performance on most of the data sets.
Databáze: OpenAIRE