Tagsets and Datasets: Some Experiments Based on Portuguese Language
Autor: | Luiza F. Trugo, Alexandre Rademaker, Fabricio Chalub, Cláudia Freitas, Guilherme Paulino-Passos |
---|---|
Rok vydání: | 2018 |
Předmět: |
Computer science
business.industry Process (engineering) 020207 software engineering 02 engineering and technology computer.software_genre language.human_language Variation (linguistics) 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence Portuguese business Participle computer Natural language processing Universal dependencies |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783319997216 PROPOR |
DOI: | 10.1007/978-3-319-99722-3_46 |
Popis: | We report the results of two experiments aimed at investigating the impact of linguistic variation on PoS tagging. In both cases, we depart from the conversion of the corpus MacMorpho [1], which was re-annotated according to the Universal Dependencies PoS tagset. Throughout the conversion process, we faced some linguistic challenges related to the past participle forms. As a result, we created two corpora (MacMoprho-UD and MacMorpho-UD+PCP). We used these three corpora (MacMorpho; MacMoprho-UD and MacMorpho-UD+PCP) to assess the impact on PoS learning in different scenarios. |
Databáze: | OpenAIRE |
Externí odkaz: |