Data processing and lemmatization in digitized 19 th -century Czech texts

Autor: Martin Stluka, Karel Kučera
Rok vydání: 2014
Předmět:
Zdroj: DATeCH
DOI: 10.1145/2595188.2595220
Popis: The paper describes the processing of linguistic data obtained through OCR, namely their use for the construction of dictionary databases and subsequent lemmatization. The process is demonstrated on the Czech prints from the 19th century.
Databáze: OpenAIRE