Multi-class multi-tag classifier system for StackOverflow questions

Autor: Jose R. Cedeno Gonzalez, Mario Graff Guerrero, Felix Calderon, Juan José Flores Romero
Rok vydání: 2015
Předmět:
Zdroj: 2015 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC).
DOI: 10.1109/ropec.2015.7395121
Popis: This work approaches the text document classification problem derived from the contest “Identify Keywords and Tags from Millions of Text Questions”, published on the website Kaggle. Using data from the StackOverflow website, the problem is to predict the tags assigned to questions. This categorization is multi-class and multi-tag, which means, a question can be assigned to different topics and can also have several tags. To solve this problem, we propose a 5-way multi-class classifier system. The results obtained by this classification scheme are discussed, by analysing certain score metrics of the classifier system. Competitive results were obtained by the 5-way classifier system, obtaining F1 scores ranging from 0.59 to 0.76. The main contribution of this paper lies on the preprocessing (which implements the feature extraction phase) and the multi-tag multi-class classification scheme.
Databáze: OpenAIRE