Robust Incremental Logistic Regression for Detection of Anomaly Using Big Data
Autor: | Topon Paul, Ken Ueno |
---|---|
Rok vydání: | 2020 |
Předmět: |
Training set
Receiver operating characteristic Computer science business.industry Big data 02 engineering and technology computer.software_genre Logistic regression Data modeling Data set Support vector machine Robustness (computer science) 020204 information systems Pattern recognition (psychology) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining business computer Test data |
Zdroj: | ICMLA |
DOI: | 10.1109/icmla51294.2020.00187 |
Popis: | Nowadays a lot of data are being continuously or incrementally collected at various fields through IoT (Internet of Things) devices and sensors. To extract useful patterns from these huge volumes of data, pattern recognition and machine learning techniques are applied, which build models by extracting patterns from data at once, and the models are not updated until the performances of the models deteriorate significantly. This traditional approach of learning a model using all the data at once may not be feasible in many applications because it may require a huge communication cost and storage to collect the data and take a very long time to build a model, and the onetime built model may not be able to learn the changed patterns automatically over time. To overcome some of the limitations of the traditional approach, we propose a new method called Robust Incremental Logistic Regression (RILR), which learns and updates model parameters as new batches of training data arrive. We show the effectiveness of the proposed method by performing experiments with 10 publicly available data sets and evaluating it in terms of AUC (Area Under the receiver operating characteristic Curve) on test data, robustness, execution time, and storage requirement. Experimental results suggest that our proposed method is able to achieve the desired performance on most of the data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |