Development And Research Of Isolating Forest Algorithms For Anomaly Detection In Transactional Data
Jazyk: | ruština |
---|---|
Rok vydání: | 2022 |
Předmět: |
изолиÑÑÑÑий леÑ
desity estimation веÑовÑе коÑÑÑиÑиенÑÑ Ð´ÑевовиднÑе алгоÑиÑÐ¼Ñ Ð¾Ñенка плоÑноÑÑи ÑаÑпÑÐµÐ´ÐµÐ»ÐµÐ½Ð¸Ñ isolation forest ÑаÑпознование аномалий tree algorithms anomaly detection weighting coefficients |
DOI: | 10.18720/spbpu/3/2023/vr/vr23-529 |
Popis: | ÐÑедмеÑом иÑÑÐ»ÐµÐ´Ð¾Ð²Ð°Ð½Ð¸Ñ ÑвлÑеÑÑÑ Ð¼Ð¾Ð´Ð¸ÑикаÑÐ¸Ñ ÑÑÑеÑÑвÑÑÑего алгоÑиÑма изолиÑÑÑÑего леÑа (далее â ÐÐ), а ÑелÑÑ â ÑвелиÑение ÑÑÑекÑивноÑÑи обна ÑÑÐ¶ÐµÐ½Ð¸Ñ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹ алгоÑимом изолиÑÑÑÑего леÑа пÑÑем его модиÑикаÑии. Ð ÑабоÑе пÑименÑлиÑÑ Ð¼ÐµÑÐ¾Ð´Ñ Ð¼Ð°ÑемаÑиÑеÑкой ÑÑаÑиÑÑики, маÑинного обÑÑÐµÐ½Ð¸Ñ Ð¸ обÑекÑно-оÑиенÑиÑованного пÑогÑаммиÑованиÑ. ÐÑл иÑÑледован алгоÑиÑм ÐРи его модиÑикаÑии: ÑаÑÑиÑеннÑй ÐÐ, ÐÐ ÑейÑмиÑеÑкой акÑивноÑÑи, обобÑеннÑй ÐÐ. Также бÑла пÑедложена и изÑÑена ÑобÑÑÐ²ÐµÐ½Ð½Ð°Ñ Ð¼Ð¾Ð´Ð¸ÑикаÑÐ¸Ñ ÐÐ â веÑовой изолиÑÑÑÑий леÑ. РеализаÑÐ¸Ñ Ð°Ð»Ð³Ð¾ÑиÑмов вÑполнÑлаÑÑ Ð½Ð° ÑзÑке С++ 20 без иÑполÑÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ ÑÑоÑÐ¾Ð½Ð½Ð¸Ñ Ð±Ð¸Ð±Ð»Ð¸Ð¾Ñек. ÐÐ°Ð±Ð¾Ñ Ð´Ð°Ð½Ð½ÑÑ Ð´Ð»Ñ ÑеÑÑиÑÐ¾Ð²Ð°Ð½Ð¸Ñ ÑодеÑжал 16 млн ÑÑанзакÑий, ÑобÑаннÑм за пÑимеÑно 5 меÑÑÑев ÑабоÑÑ. РазÑабоÑÐ°Ð½Ð½Ð°Ñ Ð¸ ÑÐµÐ°Ð»Ð¸Ð·Ð¾Ð²Ð°Ð½Ð½Ð°Ñ Ð¼Ð¾Ð´ÐµÐ»Ñ Ð²ÐµÑового изолиÑÑÑÑего леÑа в Ñ Ð¾Ð´Ðµ ÑеÑÑиÑÐ¾Ð²Ð°Ð½Ð¸Ñ Ð¾Ð±Ð½Ð°Â ÑÑÐ¶ÐµÐ½Ð¸Ñ Ð°Ð½Ð¾Ð¼Ð°Ð»Ð¸Ð¹ на депеÑÑонализиÑованнÑÑ ÑÑанзакÑионнÑÑ Ð´Ð°Ð½Ð½ÑÑ Ð¿Ð¾ÐºÐ°Ð·Ð°Ð»Ð° ÑÐµÐ±Ñ Ð½Ð°Ð¸Ð±Ð¾Ð»ÐµÐµ ÑбаланÑиÑованной моделÑÑ ÐÐ. ÐÑÑвление диапазона паÑамеÑÑов колиÑеÑÑва изолиÑÑÑÑÐ¸Ñ Ð´ÐµÑевÑев и обÑема вÑбоÑки позволÑÐµÑ Ð´Ð¾ÑÑиÑÑ Ð±Ð¾Ð»Ñ Ñей ÑоÑноÑÑи, Ñем Ñ Ð´ÑÑÐ³Ð¸Ñ Ð¼Ð¾Ð´Ð¸ÑикаÑий ÐÐ: моделей ÑаÑÑиÑенного ÐРи ÐÐ ÑейÑмиÑеÑкой акÑивноÑÑи. The subject of the study is the modification of the existing the isolating forest algorithm (hereinafter â IF), and the goal is to increase the efficiency of anomaly detection via the isolating forest algorithm by modifying it. Methods of mathematical statistics, machine learning and object-oriented programming were used in the work. The IF algorithm and its modifications were investigated: expanded IF, IF of seismic activity, generalized IF. A proprietary modification of the IF, a weight insulating forest, was also proposed and studied. The algorithms were implemented in C++ 20 without using third-party libraries. The data set for testing contained 16 million transactions collected over approximately 5 months of operation. The developed and implemented model of the weight isolating forest during testing of anomaly detection on depersonalized transactional data proved to be the most balanced IF model. Identification of the range of parameters of the number of isolating trees and the sample size allows to achieve greater accuracy than other modifications of the IF: models of extended IF and IF seismic activity. |
Databáze: | OpenAIRE |
Externí odkaz: |