Zobrazeno 1 - 10
of 59
pro vyhledávání: '"Kenneth Heafield"'
Publikováno v:
The Routledge Handbook of Translation and Health ISBN: 9781003167983
Machine translation has enormous potential to improve communication across language barriers in the healthcare setting. We first explain what machine translation (MT) is, and why it has the potential to be useful in the health domain. We provide a br
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::81d17775c094ab8d5046caca6ba57db8
https://doi.org/10.4324/9781003167983-10
https://doi.org/10.4324/9781003167983-10
Publikováno v:
ACL/IJCNLP (2)
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as
Autor:
Alham Fikri Aji, Kenneth Heafield
Publikováno v:
Aji, A F & Heafield, K 2020, Compressing Neural Machine Translation Models with 4-bit Precision . in Proceedings of the Fourth Workshop on Neural Generation and Translation . Seattle, pp. 35–42, The 4th Workshop on Neural Generation and Translation, Seattle, Washington, United States, 10/07/20 . https://doi.org/10.18653/v1/2020.ngt-1.4
Proceedings of the Fourth Workshop on Neural Generation and Translation
NGT@ACL
Proceedings of the Fourth Workshop on Neural Generation and Translation
NGT@ACL
Quantization is one way to compress Neural Machine Translation (NMT) models, especially for edge devices. This paper pushes quantization from 8 bits, seen in current work on machine translation, to 4 bits. Instead of fixed-point quantization, we use
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5a3105a7fb4fb4e7ec15e729ebec9d1c
https://hdl.handle.net/20.500.11820/ec6f46c9-625c-4771-8f03-2bcdc6940cf9
https://hdl.handle.net/20.500.11820/ec6f46c9-625c-4771-8f03-2bcdc6940cf9
Publikováno v:
Chen, P, Bogoychev, N, Heafield, K & Kirefu, F 2020, Parallel Sentence Mining by Constrained Decoding . in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 1672–1678, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.152
ACL
ACL
We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::500ab577461549726ee0ceb708cc4a4a
https://www.pure.ed.ac.uk/ws/files/161087114/Parallel_Sentence_CHEN_DOA03042020_VOR_CC_BY.pdf
https://www.pure.ed.ac.uk/ws/files/161087114/Parallel_Sentence_CHEN_DOA03042020_VOR_CC_BY.pdf
Autor:
Sergio Ortiz Rojas, Marek Strelec, Amir Kamran, Pinzhen Chen, Jaume Zaragoza, William Waites, Kenneth Heafield, Marta Bañón, Philipp Koehn, Hieu Hoang, Leopoldo Pla Sempere, Brian Thompson, Dion Wiggins, Elsa Sarrías, Faheem Kirefu, Gema Ramírez-Sánchez, Mikel L. Forcada, Barry Haddow, Miquel Esplà-Gomis
Publikováno v:
Bañón, M, Chen, P, Haddow, B, Heafield, K, Hoang, H, Esplà-Gomis, M, Forcada, M, Kamran, A, Kirefu, F, Koehn, P, Ortiz-Rojas, S, Pla, L, Ramírez-Sánchez, G, Sarrías, E, Strelec, M, Thompson, B, Waites, W, Wiggins, D & Zaragoza, J 2020, ParaCrawl: Web-Scale Acquisition of Parallel Corpora . in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 4555–4567, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.417
ACL
ACL
We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filter
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b9f4c40d019a0f748816f44701b90ba6
https://hdl.handle.net/20.500.11820/aeb1138d-856e-477a-9ea0-f3ee5900cab1
https://hdl.handle.net/20.500.11820/aeb1138d-856e-477a-9ea0-f3ee5900cab1
Publikováno v:
ACL
Aji, A F, Bogoychev, N, Heafield, K & Sennrich, R 2020, In Neural Machine Translation, What Does Transfer Learning Transfer? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 7701–7710, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.688
Aji, A F, Bogoychev, N, Heafield, K & Sennrich, R 2020, In Neural Machine Translation, What Does Transfer Learning Transfer? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 7701–7710, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.688
Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9423758fb59eecbf6119fc004b222f38
https://doi.org/10.5167/uzh-188224
https://doi.org/10.5167/uzh-188224
Autor:
Maximiliana Behnke, Kenneth Heafield
Publikováno v:
EMNLP (1)
The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be pruned. However, removing them before training a model results in lower
Autor:
Kenneth Heafield, Anna Currey
Publikováno v:
Currey, A & Heafield, K 2019, Zero-Resource Neural Machine Translation with Monolingual Pivot Data . in Proceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019) . Hong Kong, pp. 99–107, The 3rd Workshop on Neural Generation and Translation, Hong Kong, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5610
NGT@EMNLP-IJCNLP
NGT@EMNLP-IJCNLP
Zero-shot neural machine translation (NMT) is a framework that uses source-pivot and target-pivot parallel data to train a source-target NMT system. An extension to zero-shot NMT is zero-resource NMT, which generates pseudo-parallel corpora using a z
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2ed554d21fbcfcdd8097d5d256012b55
https://hdl.handle.net/20.500.11820/afb438b5-e18d-4d3f-b62e-59258373e404
https://hdl.handle.net/20.500.11820/afb438b5-e18d-4d3f-b62e-59258373e404
Publikováno v:
Aji, A F, Heafield, K & Bogoychev, N 2019, Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training . in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing . Hong Kong, pp. 3624–3629, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, Hong Kong, 3/11/19 . https://doi.org/10.18653/v1/D19-1373
EMNLP/IJCNLP (1)
EMNLP/IJCNLP (1)
One way to reduce network traffic in multi-node data-parallel stochastic gradient descent is to only exchange the largest gradients. However, doing so damages the gradient and degrades the model’s performance. Transformer models degrade dramaticall
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6bbec575d61434a17135499985c455bd
https://www.pure.ed.ac.uk/ws/files/129170063/Combining_Global_Sparse_AJI_DOA04112019_VOR_CC_BY.pdf
https://www.pure.ed.ac.uk/ws/files/129170063/Combining_Global_Sparse_AJI_DOA04112019_VOR_CC_BY.pdf
Autor:
Kenneth Heafield, Alham Fikri Aji
Publikováno v:
NGT@EMNLP-IJCNLP
Aji, A F & Heafield, K 2019, Making Asynchronous Stochastic Gradient Descent Work for Transformers . in Proceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019) . Hong Kong, pp. 80–89, The 3rd Workshop on Neural Generation and Translation, Hong Kong, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5608
Aji, A F & Heafield, K 2019, Making Asynchronous Stochastic Gradient Descent Work for Transformers . in Proceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019) . Hong Kong, pp. 80–89, The 3rd Workshop on Neural Generation and Translation, Hong Kong, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5608
Asynchronous stochastic gradient descent (SGD) is attractive from a speed perspective because workers do not wait for synchronization. However, the Transformer model converges poorly with asynchronous SGD, resulting in substantially lower quality com
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3f1e8eea080aaf885f4dfd4ed64a0ed6
http://arxiv.org/abs/1906.03496
http://arxiv.org/abs/1906.03496