An Improved Synthetic Minority Over-sampling Technique for Imbalanced Data Set Learning

Autor: Chen, Shih-Cheng, 陳世承
Rok vydání: 2017
Druh dokumentu: 學位論文 ; thesis
Popis: 105
When a few categories of instances of a data set have fewer instances than other categories, such data sets may imply a problem of category imbalances, meaning that the trained classification model is likely to be found for a small number of instances Low cause, and a small number of instances of the wrong case to determine the majority of categories of examples. It is a solution to the distribution of imbalances between the majority of categories and the few categories through the artificial minority category data examples. A variety of algorithms have been designed based on this concept. This study proposes a novel algorithm ISMOTE to solve the problem of class imbalance. ISMOTE differs from previous algorithms in that it does not take into account only a few categories of distributions, but rather measures the relative advantages of a few categories and most categories in density distributions as a basis for weighting. In addition, our approach will choose to produce artificial instances with a few category instances and most of the nearest category instances as a reference instance. This approach can reduce the situation where the classifier's learning is more difficult due to the generation of erroneous man-made data instances, and the artificial examples through this approach can better help the classifier to learn. Each of the few category instances has a weight that the classifier has difficulty studying for this data instance. The design principles of the formula are proportional to the degree of difficulty in learning with this few categories of data instances. So ISMOTE can be for each of a few categories of data instances of the weight, resulting in the corresponding number of examples of artificial data and gradually change the boundaries of classification decisions to more difficult to learn the direction.
Databáze: Networked Digital Library of Theses & Dissertations