Automatically Identifying Gender Issues in Machine Translation using Perturbations
Autor: | Kellie Webster, Hila Gonen |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Machine translation Downstream (software development) Computer science media_common.quotation_subject 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Data science 0202 electrical engineering electronic engineering information engineering Benchmark (computing) 020201 artificial intelligence & image processing Quality (business) Compiler Computation and Language (cs.CL) computer 0105 earth and related environmental sciences media_common |
Zdroj: | EMNLP (Findings) |
Popis: | The successful application of neural methods to machine translation has realized huge quality advances for the community. With these improvements, many have noted outstanding challenges, including the modeling and treatment of gendered language. While previous studies have identified issues using synthetic examples, we develop a novel technique to mine examples from real world data to explore challenges for deployed systems. We use our method to compile an evaluation benchmark spanning examples for four languages from three language families, which we publicly release to facilitate research. The examples in our benchmark expose where model representations are gendered, and the unintended consequences these gendered representations can have in downstream application. Comment: Findings of EMNLP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |