Improving SVM classification on imbalanced datasets by introducing a new bias

Autor: Luis Gonzalez-Abril, Haydemar Núñez, Cecilio Angulo
Přispěvatelé: Universidad de Sevilla. Departamento de Economía Aplicada I, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya. GREC - Grup de Recerca en Enginyeria del Coneixement
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Optimization problem
Support Vector Machine
Computer science
Algorismes
02 engineering and technology
Minority class
Library and Information Sciences
computer.software_genre
Machine learning
Post-processingBias
Mathematics (miscellaneous)
Bias
Informàtica [Àrees temàtiques de la UPC]
SMOTE [Cost-sensitive strategy]
020204 information systems
Ranking SVM
Aprenentatge automàtic
0202 electrical engineering
electronic engineering
information engineering

Sensitivity (control systems)
SMOTE
Structured support vector machine
business.industry
Function (mathematics)
Cost-sensitive strategy
Support vector machine
Algorithm
Support vector machines
ComputingMethodologies_PATTERNRECOGNITION
Post-processing
Pattern recognition (psychology)
020201 artificial intelligence & image processing
Psychology (miscellaneous)
Artificial intelligence
Data mining
Statistics
Probability and Uncertainty

business
computer
Zdroj: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
idUS. Depósito de Investigación de la Universidad de Sevilla
instname
Recercat. Dipósit de la Recerca de Catalunya
Popis: Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
Databáze: OpenAIRE