Výsledky vyhledávání - "Guillaume Wenzek"

CCMatrix: Mining Billions of High-Quality Parallel Sentences on the Web

Autor: Holger Schwenk, Angela Fan, Sergey Edunov, Armand Joulin, Edouard Grave, Guillaume Wenzek

Publikováno v: ACL/IJCNLP (1)

We show that margin-based bitext mining in a multilingual sentence space can be successfully scaled to operate on monolingual corpora of billions of sentences. We use 32 snapshots of a curated common crawl corpus (Wenzel et al, 2019) totaling 71 bill

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::33ddb41e90651fb3a62a0f5b524b595b
https://doi.org/10.18653/v1/2021.acl-long.507

Zobrazit plný text záznamu

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

Autor: Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzmán, Angela Fan

One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restrict

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::daff96849911589a8d8438c21e71b6b7

Zobrazit plný text záznamu

Generating Fact Checking Briefs

Autor: Angela Fan, Sebastian Riedel, Antoine Bordes, Andreas Vlachos, Guillaume Wenzek, Fabio Petroni, Marzieh Saeidi, Aleksandra Piktus

Publikováno v: EMNLP (1)

Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often err

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bb24438636fba6e1d2371933e2e79b8d

Zobrazit plný text záznamu

Unsupervised Cross-lingual Representation Learning at Scale

Autor: Edouard Grave, Vishrav Chaudhary, Guillaume Wenzek, Luke Zettlemoyer, Kartikay Khandelwal, Veselin Stoyanov, Naman Goyal, Myle Ott, Alexis Conneau, Francisco Guzmán

Publikováno v: ACL

This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more t

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c85bb0ac4ff2df447a9785930a1bf01d
https://doi.org/10.18653/v1/2020.acl-main.747

Zobrazit plný text záznamu

Facebook AI’s WAT19 Myanmar-English Translation Task Submission

Autor: Peng-Jen Chen, Ahmed El-Kishky, Matthew Le, Guillaume Wenzek, Jiajun Shen, Myle Ott, Marc'Aurelio Ranzato, Vishrav Chaudhary

Publikováno v: WAT@EMNLP-IJCNLP

This paper describes Facebook AI’s submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-trainin

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::d1974ea0a9e8bbf1d7cbcde059da5b5f
https://doi.org/10.18653/v1/d19-5213

Zobrazit plný text záznamu

Trans-gram, Fast Cross-lingual Word-embeddings

Autor: Amine Benhalloum, Jocelyn Coulmance, Jean-Marc Marty, Guillaume Wenzek

Publikováno v: EMNLP

We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compu

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a6b5aa4261c29f984b0d97ce2c00333f
https://doi.org/10.18653/v1/d15-1131

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání