Improving SVM classification on imbalanced datasets by introducing a new bias

Autor:	Luis Gonzalez-Abril, Haydemar Núñez, Cecilio Angulo
Přispěvatelé:	Universidad de Sevilla. Departamento de Economía Aplicada I, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya. GREC - Grup de Recerca en Enginyeria del Coneixement
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	Optimization problem Support Vector Machine Computer science Algorismes 02 engineering and technology Minority class Library and Information Sciences computer.software_genre Machine learning Post-processingBias Mathematics (miscellaneous) Bias Informàtica [Àrees temàtiques de la UPC] SMOTE [Cost-sensitive strategy] 020204 information systems Ranking SVM Aprenentatge automàtic 0202 electrical engineering electronic engineering information engineering Sensitivity (control systems) SMOTE Structured support vector machine business.industry Function (mathematics) Cost-sensitive strategy Support vector machine Algorithm Support vector machines ComputingMethodologies_PATTERNRECOGNITION Post-processing Pattern recognition (psychology) 020201 artificial intelligence & image processing Psychology (miscellaneous) Artificial intelligence Data mining Statistics Probability and Uncertainty business computer
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) idUS. Depósito de Investigación de la Universidad de Sevilla instname Recercat. Dipósit de la Recerca de Catalunya
Popis:	Support Vector Machine (SVM) learning from imbalanced datasets, as well as most learning machines, can show poor performance on the minority class because SVMs were designed to induce a model based on the overall error. To improve their performance in these kind of problems, a low-cost post-processing strategy is proposed based on calculating a new bias to adjust the function learned by the SVM. The proposed bias will consider the proportional size between classes in order to improve performance on the minority class. This solution avoids not only introducing and tuning new parameters, but also modifying the standard optimization problem for SVM training. Experimental results on 34 datasets, with different degrees of imbalance, show that the proposed method actually improves the classification on imbalanced datasets, by using standardized error measures based on sensitivity and g-means. Furthermore, its performance is comparable to well-known cost-sensitive and Synthetic Minority Over-sampling Technique (SMOTE) schemes, without adding complexity or computational costs.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5c9a2a106cc3e5aa1c14a7636da17dbe http://hdl.handle.net/2117/110122 Zobrazit plný text záznamu