Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness
Autor: | Claude Roux, Ioan Calapodescu, Jean-Luc Meunier, Alexandre Berard, Vassilina Nikoulina, Marc Dymetman |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Domain adaptation Computer Science - Computation and Language Machine translation Computer science business.industry Sentiment analysis Machine learning computer.software_genre Robustness (computer science) Artificial intelligence business Computation and Language (cs.CL) computer |
Zdroj: | NGT@EMNLP-IJCNLP |
Popis: | We share a French-English parallel corpus of Foursquare restaurant reviews (https://europe.naverlabs.com/research/natural-language-processing/machine-translation-of-restaurant-reviews), and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such user-generated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose task-specific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words. WNGT 2019 Paper |
Databáze: | OpenAIRE |
Externí odkaz: |