Unsupervised Labeling for Supervised Anomaly Detection in Enterprise and Cloud Networks
Autor: | Ikkyun Kim, Sang C. Suh, Hyunjoo Kim, Donghwoon Kwon, Jinoh Kim, Sunhee Baek |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
business.industry Supervised learning Stability (learning theory) 020206 networking & telecommunications 02 engineering and technology Semi-supervised learning computer.software_genre Machine learning Data modeling Set (abstract data type) ComputingMethodologies_PATTERNRECOGNITION 0202 electrical engineering electronic engineering information engineering Unsupervised learning 020201 artificial intelligence & image processing Anomaly detection Artificial intelligence Data mining business Cluster analysis computer |
Zdroj: | CSCloud |
DOI: | 10.1109/cscloud.2017.26 |
Popis: | Identifying anomalous events in the network is one of the vital functions in enterprises, ISPs, and datacenters to protect the internal resources. With its importance, there has been a substantial body of work for network anomaly detection using supervised and unsupervised machine learning techniques with their own strengths and weaknesses. In this work, we take advantage of the both worlds of unsupervised and supervised learning methods. The basic process model we present in this paper includes (i) clustering the training data set to create referential labels, (ii) building a supervised learning model with the automatically produced labels, and (iii) testing individual data points in question using the established learning model. By doing so, it is possible to construct a supervised learning model without the provision of the associated labels, which are often not available in practice. To attain this process, we set up a new property defining anomalies in the context of clustering, based on our observations from anomalous events in network, by which the referential labels can be obtained. Through our extensive experiments with a public data set (NSL-KDD), we will show that the presented method perform very well, yielding fairly comparable performance to the traditional method running with the original labels provided in the data set, with respect to the accuracy for anomaly detection. |
Databáze: | OpenAIRE |
Externí odkaz: |