Semr-11: A Multi-Lingual Gold-Standard For Semantic Similarity And Relatedness For Eleven Languages
Autor: | Barzegar, S., Brian Davis, Zarrouk, M., Handschuh, S., Freitas, A. |
---|---|
Přispěvatelé: | Science Foundation Ireland, Horizon 2020 |
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: | |
Zdroj: | Scopus-Elsevier 11th edition of the Language Resources and Evaluation Conference (LREC 2018) |
DOI: | 10.5281/zenodo.1228904 |
Popis: | This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein & Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness tasks. This publication has emanated from research funded in part from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 645425 SSIX and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289. We would like in particular to thank Alexandros Poulis and Juha Vilhunen from the Global Services for Machine Intelligence Group, Lionbridge Finland 6 ensuring the production word of high quality translations for our similarity datasets. peer-reviewed |
Databáze: | OpenAIRE |
Externí odkaz: |