From Digitisation Process to Terminological Digital Resources

Autor: Sanja Seljan, Dunder, I., Gaspar, A.
Přispěvatelé: Biljanović, P.
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Zdroj: Publons
Popis: Monolingual and multilingual terminology and collocation bases represent valuable additional electronic resources, which can be used in further research, in written communication and in everyday communication. Building of such resources can be supported by terminology extraction tools relying on statistical or language approaches, or on hybrid model, but require considerable human expertise in evaluation and final compilation. The paper describes the whole process: from digitisation of printed material, OCR techniques, sentence alignment and creation of translation memories, up to terminology extraction and evaluation. The performance of tools and applied methodology is assessed through standard statistical measures of precision, recall and F-measure. Experimental results are produced, deficiencies of semi-automatic statistical and linguistic system highlighted and recommendations for further research suggested.
Databáze: OpenAIRE