The First Annotated Corpus of Historical Basque
Autor: | Ander Soraluze, Manuel Padilla-Moyano, Izaskun Etxeberria, Ricardo Etxepare, Ainara Estarrona |
---|---|
Přispěvatelé: | Centre de recherche sur la langue et les textes basques (IKER), Université de Pau et des Pays de l'Adour (UPPA)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
060201 languages & linguistics
Linguistics and Language History 06 humanities and the arts 02 engineering and technology diachronic syntax Language and Linguistics Computer Science Applications [SHS]Humanities and Social Sciences historical corpora 0602 languages and literature 0202 electrical engineering electronic engineering information engineering natural language processing (NLP) 020201 artificial intelligence & image processing digital humanities history of Basque Information Systems |
Zdroj: | Digital Scholarship in the Humanities Digital Scholarship in the Humanities, Oxford University Press, 2021, ⟨10.1093/llc/fqab066⟩ |
ISSN: | 2055-7671 2055-768X |
DOI: | 10.1093/llc/fqab066⟩ |
Popis: | This article presents the elaboration of a morphosyntactically annotated diachronic corpus of Basque, and the first results obtained in the processing of historical varieties of this language with computational techniques. The corpus size is around one million words, expanding from the 15th to the mid-18th century and encompassing the most significant written production in all historical dialects. Morphosyntactic tagging allows for systematic searches at different levels of complexity; additionally, a rich set of metadata enables searches based on sociohistorical criteria too. This is not only the first tagged corpus of historical Basque but also a means to improve language processing tools by analyzing historical varieties more or less distant from the present-day standard language. Moreover, this project aims to set a model for further works in the historical corpora of Basque and inform similar projects on other languages. |
Databáze: | OpenAIRE |
Externí odkaz: |