Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Guillaume Wenzek"'
Publikováno v:
ACL/IJCNLP (1)
We show that margin-based bitext mining in a multilingual sentence space can be successfully scaled to operate on monolingual corpora of billions of sentences. We use 32 snapshots of a curated common crawl corpus (Wenzel et al, 2019) totaling 71 bill
Autor:
Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzmán, Angela Fan
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restrict
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::daff96849911589a8d8438c21e71b6b7
Autor:
Angela Fan, Sebastian Riedel, Antoine Bordes, Andreas Vlachos, Guillaume Wenzek, Fabio Petroni, Marzieh Saeidi, Aleksandra Piktus
Publikováno v:
EMNLP (1)
Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often err
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bb24438636fba6e1d2371933e2e79b8d
Autor:
Edouard Grave, Vishrav Chaudhary, Guillaume Wenzek, Luke Zettlemoyer, Kartikay Khandelwal, Veselin Stoyanov, Naman Goyal, Myle Ott, Alexis Conneau, Francisco Guzmán
Publikováno v:
ACL
This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more t
Autor:
Peng-Jen Chen, Ahmed El-Kishky, Matthew Le, Guillaume Wenzek, Jiajun Shen, Myle Ott, Marc'Aurelio Ranzato, Vishrav Chaudhary
Publikováno v:
WAT@EMNLP-IJCNLP
This paper describes Facebook AI’s submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-trainin
Publikováno v:
EMNLP
We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compu