Breaking news: Unveiling a new dataset for Portuguese news classification and comparative analysis of approaches.

Autor:	Klaifer Garcia, Pedro Shiguihara, Lilian Berton
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Medicine Science
Zdroj:	PLoS ONE, Vol 19, Iss 1, p e0296929 (2024)
Druh dokumentu:	article
ISSN:	1932-6203
DOI:	10.1371/journal.pone.0296929&type=printable
Popis:	Every day thousands of news are published on the web and filtering tools can be used to extract knowledge on specific topics. The categorization of news into a predefined set of topics is a subject widely studied in the literature, however, most works are restricted to documents in English. In this work, we make two contributions. First, we introduce a Portuguese news dataset collected from WikiNews an open-source media that provide news from different sources. Since there is a lack of datasets for Portuguese, and an existing one is from a single news channel, we aim to introduce a dataset from different news channels. The availability of comprehensive datasets plays a key role in advancing research. Second, we compare different architectures for Portuguese news classification, exploring different text representations (BoW, TF-IDF, Embedding) and classification techniques (SVM, CNN, DJINN, BERT) for documents in Portuguese, covering classical methods and current technologies. We show the trade-off between accuracy and training time for this application. We aim to show the capabilities of available algorithms and the challenges faced in the area.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/8de226d79a4e481cafc2192c1723a4a4 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.