A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem
Autor: | Seref Sagiroglu, Duygu Sinanc Terzi |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Computer science
Big data 02 engineering and technology imbalanced data computer.software_genre imbalanced big data classification QA76.75-76.765 Class imbalance big data Resampling 0202 electrical engineering electronic engineering information engineering Computer software Project management cluster-based resampling business.industry Software development Information technology 020206 networking & telecommunications General Medicine ComputingMethodologies_PATTERNRECOGNITION 020201 artificial intelligence & image processing Data mining business computer Cluster based |
Zdroj: | Applied Computer Systems, Vol 24, Iss 2, Pp 104-110 (2019) |
Popis: | The class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design aims at modifying the existing dataset to increase the classification success. Within the study, DIBID has been implemented on public datasets under two strategies. The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios. The second strategy has been designed to compare the success of the model with other imbalanced big data solutions in the literature. According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study. |
Databáze: | OpenAIRE |
Externí odkaz: |