Open Text News Benchmark: A Novel ChineseNews Benchmark for Text Classification

Autor: Guishen Wang, Xiaoxuan Guo, Junlin Wu, Xiaotang Zhou
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-2126654/v1
Popis: With the help of deep learning technology, related topics of natural language processing such as summarization, text classification, and sentiment analysis, are researched deeper and deeper. However, related benchmarks especially for Chinese language are in short. For this reason, we propose a novel Chinese news text benchmark called Open Text News benchmark (OTN) enhancing related research in natural language processing. OTNSTANDARD contains 90,000 Chinese news from April 2000 to June 2021 in nine categories, including finance, estate, education, technology, military, automobiles, sports, games, and entertainment. We employed several classical machine learning classifiers and several deep learning models to test our OTN benchmark to see how well it performed at text classification.
Databáze: OpenAIRE