Autor: |
Zhou, Shijia, Shan, Huangyan, Plank, Barbara, Litschko, Robert |
Rok vydání: |
2024 |
Předmět: |
|
Druh dokumentu: |
Working Paper |
Popis: |
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set. |
Databáze: |
arXiv |
Externí odkaz: |
|