Autor: |
Pica, Morgane L. |
Jazyk: |
English<br />Spanish; Castilian<br />French<br />Italian |
Rok vydání: |
2022 |
Předmět: |
|
Zdroj: |
Studia Linguistica Romanica, Vol 2, Iss 8, Pp 131-154 (2022) |
Druh dokumentu: |
article |
ISSN: |
2663-9815 |
DOI: |
10.25364/19.2022.8.7 |
Popis: |
The corpus compiled for the RIN ConDÉ project consists of twelve reference sources on Norman customary law, from the 13th to the 19th century. Despite dealing with the same subject, the texts in this corpus are very heterogeneous in terms of format and structure. The texts were processed with the HTR tool Transkribus; Python and XSLT languages were employed for automated transformations; lemmatization was performed by AnaLog and the data was encoded using the TEI encoding model. Processing the data required a stage of reflection to identify the best means of restoring the structures and reference systems and to devise a set of lemma and part-of-speech tags that would work for texts covering six centuries of linguistic evolution. To make the texts maxi - mally comparable, it was eventually decided to create a three-level structure (part > chapter > section). |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|