Finding Multiword Term Candidates in Croatian

Autor:	Tadić, Marko, Šojat, Krešimir
Jazyk:	angličtina
Rok vydání:	2003
Předmět:	Croatian Language multiword terms term candidates statistical processing mutual information
Popis:	The paper presents the research in the field of statistical processing of a corpus of texts in Croatian with the primary aim of finding statistically significant co-occurrences of n-grams of tokens (digrams , trigrams and tetragrams). The collocations found with this method present the list of candidates for multiword terminological units submitted to terminologists for further processing i.e. manual selecting of the “ ; real terms” ; . The statistical measure of co-occurrence used is mutual information (MI3) accompanied with linguistic filters: stop-words and POS. The results on non-lemmatized material of a highly inflected lan-guage such as Croatian show that MI measure alone is not sufficient to find satisfactory number of multi-word term candidates. In this case the usage of absolute frequency combined with linguistic filtering techniques gives broader list of candidates for real terms.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=57a035e5b1ae::4b1f95df56596126942b81fa7c7f303a https://www.bib.irb.hr/126566 Zobrazit plný text záznamu