usfAD: a robust anomaly detector based on unsupervised stochastic forest.

Autor: Aryal, Sunil, Santosh, K.C., Dazeley, Richard
Zdroj: International Journal of Machine Learning & Cybernetics; Apr2021, Vol. 12 Issue 4, p1137-1150, 14p
Abstrakt: In real-world applications, data can be represented using different units/scales. For example, weight in kilograms or pounds and fuel-efficiency in km/l or l/100 km. One unit can be a linear or non-linear scaling of another. The variation in metrics due to the non-linear scaling makes Anomaly Detection (AD) challenging. Most existing AD algorithms rely on distance- or density-based functions, which makes them sensitive to how data is expressed. This means that they are representation dependent. To avoid such a problem, we introduce a new anomaly detection method, which we call 'usfAD: Unsupervised Stochastic Forest-based Anomaly Detector'. Our empirical evaluation in synthetic and real-world cybersecurity (spam detection, malicious URL detection and intrusion detection) datasets shows that our approach is more robust to the variation in units/scales used to express data. It produces more consistent and better results than five state-of-the-art AD methods namely: local outlier factor; one-class support vector machine; isolation forest; nearest neighbor in a random subsample of data; and, simple histogram-based probabilistic method. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index