Popis: |
Terminology bases have been frequently used by translators, terminologists, lexicographers and everyday users. Labour intensive terminological work encompasses the identification and the ranking of term candidates according to their domain representativity. In order to measure it, a sample of Croatian–English parallel corpora is used and a hybrid statistical and linguistic approach has been adopted as in Frantzi et al.(2000), Nakagava et.al.(1998), Daille (1955), Alegria et.al.(2004). While statistical techniques are used for ranking and filtering of n-gram term candidates, linguistic techniques are applied to select terms capturing their morpho-syntactic properties. The automatically created list is then evaluated and followed by the description of morpho-syntactic patterns. Multiword term list obtained through hybrid approach is then evaluated through measures of precision, recall and F-measure and analyzed with suggestions for further improvements. NooJ linguistic englineering tool has been applied in order to meet linguistic needs of this research. |