Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection

Autor:	Menghua Luo, Chak Fong Cheang, Yangyang Li, Anfeng Liu, Zhiping Cai, Ke Wang
Rok vydání:	2019
Předmět:	business.industry Computer science Anomaly (natural sciences) 020206 networking & telecommunications 02 engineering and technology Machine learning computer.software_genre Imbalanced data Synthetic data Computer Science Applications Biomaterials Line segment Mechanics of Materials Modeling and Simulation 0202 electrical engineering electronic engineering information engineering Range (statistics) 020201 artificial intelligence & image processing Anomaly detection Artificial intelligence Electrical and Electronic Engineering Focus (optics) business computer Real world data
Zdroj:	Computers, Materials & Continua. 58:15-26
ISSN:	1546-2226
DOI:	10.32604/cmc.2019.03708
Popis:	The extreme imbalanced data problem is the core issue in anomaly detection. The amount of abnormal data is so small that we can not get adequate information to analyze it. The mainstream methods focus on taking fully advantages of the normal data, of which the discrimination method is that the data not belonging to normal data distribution is the anomaly. From the view of data science, we concentrate on the abnormal data and generate artificial abnormal samples by machine learning method. In this kind of technologies, Synthetic Minority Over-sampling Technique and its improved algorithms are representative milestones, which generate synthetic examples randomly in selected line segments. In our work, we break the limitation of line segment and propose an Imbalanced Triangle Synthetic Data method. In theory, our method covers a wider range. In experiment with real world data, our method performs better than the SMOTE and its meliorations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5fb4e397be093378090f029bd5db85b8 https://doi.org/10.32604/cmc.2019.03708 Zobrazit plný text záznamu