Distributed Synthetic Minority Oversampling Technique

Autor:	Sakshi Hooda, Suman Mann
Jazyk:	angličtina
Předmět:	SMOTE apache spark prediction machine learning imbalanced classification Electronic computers. Computer science QA75.5-76.95
Zdroj:	International Journal of Computational Intelligence Systems (null)
Druh dokumentu:	article
ISSN:	1875-6883
DOI:	10.2991/ijcis.d.190719.001
Popis:	Real world problems for prediction usually try to predict rare occurrences. Application of standard classification algorithm is biased toward against these rare events, due to this data imbalance. Typical approaches to solve this data imbalance involve oversampling these “rare events” or under sampling the majority occurring events. Synthetic Minority Oversampling Technique is one technique that addresses this class imbalance effectively. However, the existing implementations of SMOTE fail when data grows and can't be stored on a single machine. In this paper present our solution to address the “big data challenge.” We provide a distributed version of SMOTE by using scalable k-means++ and M-Trees. With this implementation of SMOTE, we were able to oversample the “rare events” and achieve results which are better than the existing python version of SMOTE.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2b78e3483b2348c4a0186e4f5e95cd93 Zobrazit plný text záznamu View record in DOAJ