Towards the Corpus of Latvian Romani Texts : Deciphering the Manuscripts in Jānis Leimanis' Archive

Autor: Perkova, Natalia, Kozhanov, Kirill
Přispěvatelé: Department of Finnish, Finno-Ugrian and Scandinavian Studies
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Popis: Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Latvian Romani is a Northeastern Romani dialect with a limited number of publicly available sources. Two large archival collections of texts in Latvian Romani, compiled primarily in the 1930s in Latvia and Estonia, have been recently digitized as images and made available online for a wider public. In our study, we focus on one of these collections, the Latvian Romani folklore texts collected by Jānis Leimanis in interwar Latvia. In this paper, we describe how initial manual transcriptions, most of which have been created with the help of a special crowdsourcing platform, were integrated in the handwritten text recognition (HTR) workflow in Transkribus. We present two HTR models trained on the basis of Leimanis' collection and discuss various issues related to the work on these texts.
Databáze: OpenAIRE