The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Autor: | Ashish Vaswani, Zhifeng Chen, Mia Xu Chen, Noam Shazeer, Llion Jones, Jakob Uszkoreit, Ankur Bapna, Mike Schuster, Macduff Hughes, Yonghui Wu, George Foster, Melvin Johnson, Niki Parmar, Lukasz Kaiser, Orhan Firat, Wolfgang Macherey |
---|---|
Rok vydání: | 2018 |
Předmět: |
Machine translation
Computer science business.industry 02 engineering and technology computer.software_genre Machine learning language.human_language German 030507 speech-language pathology & audiology 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science business computer Transformer (machine learning model) |
Zdroj: | ACL (1) |
Popis: | The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |