Tagsets and Datasets: Some Experiments Based on Portuguese Language

Autor: Luiza F. Trugo, Alexandre Rademaker, Fabricio Chalub, Cláudia Freitas, Guilherme Paulino-Passos
Rok vydání: 2018
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783319997216
PROPOR
DOI: 10.1007/978-3-319-99722-3_46
Popis: We report the results of two experiments aimed at investigating the impact of linguistic variation on PoS tagging. In both cases, we depart from the conversion of the corpus MacMorpho [1], which was re-annotated according to the Universal Dependencies PoS tagset. Throughout the conversion process, we faced some linguistic challenges related to the past participle forms. As a result, we created two corpora (MacMoprho-UD and MacMorpho-UD+PCP). We used these three corpora (MacMorpho; MacMoprho-UD and MacMorpho-UD+PCP) to assess the impact on PoS learning in different scenarios.
Databáze: OpenAIRE