The RTBF Corpus: a dataset of 750,000 Belgian French news articles published between 2008 and 2021

Autor: Escouflaire, Louis, Bogaert, Jérémie, Descampe, Antonin, Fairon, Cédrick
Přispěvatelé: UCL - SSH/ILC/PCOM - Pôle de recherche en communication
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Popis: In this paper, we introduce the RTBF Corpus, a large diachronic corpus of 767,204 Belgian French news articles published between 2008 and 2021 by the Belgian public service media RTBF. We present the contents and structure of the corpus, along with the different layers of metadata available for each text. We also describe the three different versions of the articles available in the corpus (depending on the cleaning and preprocessing steps applied to the text). The RTBF corpus is freely available online in CSV format (https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/PEVSSI), for research and teaching purposes only.
Databáze: OpenAIRE