Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Muzio, Alexandre"'
The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory
Externí odkaz:
http://arxiv.org/abs/2404.05089
Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achiev
Externí odkaz:
http://arxiv.org/abs/2205.14336
Autor:
Yang, Jian, Ma, Shuming, Huang, Haoyang, Zhang, Dongdong, Dong, Li, Huang, Shaohan, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is
Externí odkaz:
http://arxiv.org/abs/2111.02086
Autor:
Kim, Young Jin, Awan, Ammar Ahmad, Muzio, Alexandre, Salinas, Andres Felipe Cruz, Lu, Liyang, Hendy, Amr, Rajbhandari, Samyam, He, Yuxiong, Awadalla, Hany Hassan
The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunitie
Externí odkaz:
http://arxiv.org/abs/2109.10465
Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low
Externí odkaz:
http://arxiv.org/abs/2109.04778
Autor:
Fan, Yimin, Liang, Yaobo, Muzio, Alexandre, Hassan, Hany, Li, Houqiang, Zhou, Ming, Duan, Nan
Multilingual pre-trained models have demonstrated their effectiveness in many multilingual NLP tasks and enabled zero-shot or few-shot transfer from high-resource languages to low resource ones. However, due to significant typological differences and
Externí odkaz:
http://arxiv.org/abs/2109.00271
Autor:
Ma, Shuming, Dong, Li, Huang, Shaohan, Zhang, Dongdong, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework,
Externí odkaz:
http://arxiv.org/abs/2106.13736
Autor:
Ma, Shuming, Yang, Jian, Huang, Haoyang, Chi, Zewen, Dong, Li, Zhang, Dongdong, Awadalla, Hany Hassan, Muzio, Alexandre, Eriguchi, Akiko, Singhal, Saksham, Song, Xia, Menezes, Arul, Wei, Furu
Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success o
Externí odkaz:
http://arxiv.org/abs/2012.15547
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
BIANCHINI, RICARDO, FONTOURA, MARCUS, CORTEZ, ELI, BONDE, ANAND, MUZIO, ALEXANDRE, CONSTANTIN, ANA-MARIA, MOSCIBRODA, THOMAS, MAGALHAES, GABRIEL, BABLANI, GIRISH, RUSSINOVICH, MARK
Publikováno v:
Communications of the ACM; Feb2020, Vol. 63 Issue 2, p50-59, 10p, 1 Color Photograph, 2 Diagrams, 2 Charts, 2 Graphs