Data processing and lemmatization in digitized 19 th -century Czech texts
Autor: | Martin Stluka, Karel Kučera |
---|---|
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | DATeCH |
DOI: | 10.1145/2595188.2595220 |
Popis: | The paper describes the processing of linguistic data obtained through OCR, namely their use for the construction of dictionary databases and subsequent lemmatization. The process is demonstrated on the Czech prints from the 19th century. |
Databáze: | OpenAIRE |
Externí odkaz: |