Výsledky vyhledávání - "Kenneth Heafield"

Autor: Kenneth Heafield, Barry Haddow, Alexandra Birch

Publikováno v: The Routledge Handbook of Translation and Health ISBN: 9781003167983

Machine translation has enormous potential to improve communication across language barriers in the healthcare setting. We first explain what machine translation (MT) is, and why it has the potential to be useful in the health domain. We provide a br

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::81d17775c094ab8d5046caca6ba57db8
https://doi.org/10.4324/9781003167983-10

Zobrazit plný text záznamu

Gender bias amplification during Speed-Quality optimization in Neural Machine Translation

Autor: Mona Diab, Kenneth Heafield, Xian Li, Denise Diaz, Adithya Renduchintala

Publikováno v: ACL/IJCNLP (2)

Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a10a2d309fbe17b19142bbfdb38c24a2
https://doi.org/10.18653/v1/2021.acl-short.15

Zobrazit plný text záznamu

Compressing Neural Machine Translation Models with 4-bit Precision

Autor: Alham Fikri Aji, Kenneth Heafield

Publikováno v: Aji, A F & Heafield, K 2020, Compressing Neural Machine Translation Models with 4-bit Precision . in Proceedings of the Fourth Workshop on Neural Generation and Translation . Seattle, pp. 35–42, The 4th Workshop on Neural Generation and Translation, Seattle, Washington, United States, 10/07/20 . https://doi.org/10.18653/v1/2020.ngt-1.4
Proceedings of the Fourth Workshop on Neural Generation and Translation
NGT@ACL

Quantization is one way to compress Neural Machine Translation (NMT) models, especially for edge devices. This paper pushes quantization from 8 bits, seen in current work on machine translation, to 4 bits. Instead of fixed-point quantization, we use

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5a3105a7fb4fb4e7ec15e729ebec9d1c
https://hdl.handle.net/20.500.11820/ec6f46c9-625c-4771-8f03-2bcdc6940cf9

Zobrazit plný text záznamu

Parallel Sentence Mining by Constrained Decoding

Autor: Pinzhen Chen, Nikolay Bogoychev, Faheem Kirefu, Kenneth Heafield

Publikováno v: Chen, P, Bogoychev, N, Heafield, K & Kirefu, F 2020, Parallel Sentence Mining by Constrained Decoding . in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 1672–1678, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.152
ACL

We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::500ab577461549726ee0ceb708cc4a4a
https://www.pure.ed.ac.uk/ws/files/161087114/Parallel_Sentence_CHEN_DOA03042020_VOR_CC_BY.pdf

Zobrazit plný text záznamu

ParaCrawl: Web-Scale Acquisition of Parallel Corpora

Autor: Sergio Ortiz Rojas, Marek Strelec, Amir Kamran, Pinzhen Chen, Jaume Zaragoza, William Waites, Kenneth Heafield, Marta Bañón, Philipp Koehn, Hieu Hoang, Leopoldo Pla Sempere, Brian Thompson, Dion Wiggins, Elsa Sarrías, Faheem Kirefu, Gema Ramírez-Sánchez, Mikel L. Forcada, Barry Haddow, Miquel Esplà-Gomis

Publikováno v: Bañón, M, Chen, P, Haddow, B, Heafield, K, Hoang, H, Esplà-Gomis, M, Forcada, M, Kamran, A, Kirefu, F, Koehn, P, Ortiz-Rojas, S, Pla, L, Ramírez-Sánchez, G, Sarrías, E, Strelec, M, Thompson, B, Waites, W, Wiggins, D & Zaragoza, J 2020, ParaCrawl: Web-Scale Acquisition of Parallel Corpora . in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 4555–4567, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.417
ACL

We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software. We empirically compare alternative methods and publish benchmark data sets for sentence alignment and sentence pair filter

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b9f4c40d019a0f748816f44701b90ba6
https://hdl.handle.net/20.500.11820/aeb1138d-856e-477a-9ea0-f3ee5900cab1

Zobrazit plný text záznamu

In Neural Machine Translation, What Does Transfer Learning Transfer?

Autor: Alham Fikri Aji, Rico Sennrich, Nikolay Bogoychev, Kenneth Heafield

Publikováno v: ACL
Aji, A F, Bogoychev, N, Heafield, K & Sennrich, R 2020, In Neural Machine Translation, What Does Transfer Learning Transfer? in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . pp. 7701–7710, 2020 Annual Conference of the Association for Computational Linguistics, Virtual conference, Washington, United States, 5/07/20 . https://doi.org/10.18653/v1/2020.acl-main.688

Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9423758fb59eecbf6119fc004b222f38
https://doi.org/10.5167/uzh-188224

Zobrazit plný text záznamu

Losing Heads in the Lottery: Pruning Transformer Attention in Neural Machine Translation

Autor: Maximiliana Behnke, Kenneth Heafield

Publikováno v: EMNLP (1)

The attention mechanism is the crucial component of the transformer architecture. Recent research shows that most attention heads are not confident in their decisions and can be pruned. However, removing them before training a model results in lower

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::194c8777a4a1683b711547c51fcac4e6
https://doi.org/10.18653/v1/2020.emnlp-main.211

Zobrazit plný text záznamu

Zero-Resource Neural Machine Translation with Monolingual Pivot Data

Autor: Kenneth Heafield, Anna Currey

Publikováno v: Currey, A & Heafield, K 2019, Zero-Resource Neural Machine Translation with Monolingual Pivot Data . in Proceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019) . Hong Kong, pp. 99–107, The 3rd Workshop on Neural Generation and Translation, Hong Kong, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5610
NGT@EMNLP-IJCNLP

Zero-shot neural machine translation (NMT) is a framework that uses source-pivot and target-pivot parallel data to train a source-target NMT system. An extension to zero-shot NMT is zero-resource NMT, which generates pseudo-parallel corpora using a z

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2ed554d21fbcfcdd8097d5d256012b55
https://hdl.handle.net/20.500.11820/afb438b5-e18d-4d3f-b62e-59258373e404

Zobrazit plný text záznamu

Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training

Autor: Alham Fikri Aji, Nikolay Bogoychev, Kenneth Heafield

Publikováno v: Aji, A F, Heafield, K & Bogoychev, N 2019, Combining Global Sparse Gradients with Local Gradients in Distributed Neural Network Training . in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing . Hong Kong, pp. 3624–3629, 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Hong Kong, Hong Kong, 3/11/19 . https://doi.org/10.18653/v1/D19-1373
EMNLP/IJCNLP (1)

One way to reduce network traffic in multi-node data-parallel stochastic gradient descent is to only exchange the largest gradients. However, doing so damages the gradient and degrades the model’s performance. Transformer models degrade dramaticall

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6bbec575d61434a17135499985c455bd
https://www.pure.ed.ac.uk/ws/files/129170063/Combining_Global_Sparse_AJI_DOA04112019_VOR_CC_BY.pdf

Zobrazit plný text záznamu

Making Asynchronous Stochastic Gradient Descent Work for Transformers

Autor: Kenneth Heafield, Alham Fikri Aji

Publikováno v: NGT@EMNLP-IJCNLP
Aji, A F & Heafield, K 2019, Making Asynchronous Stochastic Gradient Descent Work for Transformers . in Proceedings of the The 3rd Workshop on Neural Generation and Translation (WNGT 2019) . Hong Kong, pp. 80–89, The 3rd Workshop on Neural Generation and Translation, Hong Kong, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5608

Asynchronous stochastic gradient descent (SGD) is attractive from a speed perspective because workers do not wait for synchronization. However, the Transformer model converges poorly with asynchronous SGD, resulting in substantially lower quality com

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3f1e8eea080aaf885f4dfd4ed64a0ed6
http://arxiv.org/abs/1906.03496

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání