A Sample Extension Method Based on Wikipedia and Its Application in Text Classification

Autor:	Guannan Hu, Zhiguo Lu, Jianyue Ni, Wenhao Zhu, Yiting Liu
Rok vydání:	2018
Předmět:	Computer science business.industry Supervised learning Information processing Sample (statistics) 02 engineering and technology Semi-supervised learning computer.software_genre Computer Science Applications Set (abstract data type) 020204 information systems 0202 electrical engineering electronic engineering information engineering Extension method 020201 artificial intelligence & image processing The Internet Artificial intelligence Electrical and Electronic Engineering business Classifier (UML) computer Natural language processing
Zdroj:	Wireless Personal Communications. 102:3851-3867
ISSN:	1572-834X 0929-6212
Popis:	Text classification is a topic in natural language processing that is particularly useful for Internet information processing. Methods based on supervised learning require a large amount of manually annotated training samples. The annotation of training samples is time consuming, and performance relies heavily on the quality of the training samples. This paper presents a text classification method based on sample extension. The extension is based on the correlation of the labeled sample data and the concepts in Wikipedia. Combined with the rich link relationships between concepts, we selected appropriate articles from Wikipedia to expand the training sample set. By introducing the large amount of rich semantic concept pages that are contained in Wikipedia along with links that are related to different pages, our approach enhances the performance and generalization of the classifier. Experiments demonstrate that the performance of the method proposed in this paper is better than that of both supervised and semi-supervised methods.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::dd5a0eed844ac7423d9b8f48867fbf77 https://doi.org/10.1007/s11277-018-5416-z Zobrazit plný text záznamu Full text from SpringerLink