Cross-language Sentence Selection via Data Augmentation and Rationale Training
Autor: | Chen, Yanda, Kedzie, Chris, Nair, Suraj, Galuščáková, Petra, Zhang, Rui, Oard, Douglas W., McKeown, Kathleen |
---|---|
Rok vydání: | 2021 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | This paper proposes an approach to cross-language sentence selection in a low-resource setting. It uses data augmentation and negative sampling techniques on noisy parallel sentence data to directly learn a cross-lingual embedding-based query relevance model. Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data. Moreover, when a rationale training secondary objective is applied to encourage the model to match word alignment hints from a phrase-based statistical machine translation model, consistent improvements are seen across three language pairs (English-Somali, English-Swahili and English-Tagalog) over a variety of state-of-the-art baselines. Comment: ACL 2021 main conference |
Databáze: | arXiv |
Externí odkaz: |