Výsledky vyhledávání - "Muzio, Alexandre"

Report

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

Autor: Muzio, Alexandre, Sun, Alex, He, Churan

The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory

Externí odkaz: http://arxiv.org/abs/2404.05089

Zobrazit plný text záznamu

Report

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Autor: Liu, Rui, Kim, Young Jin, Muzio, Alexandre, Awadalla, Hany Hassan

Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost. To achiev

Externí odkaz: http://arxiv.org/abs/2205.14336

Zobrazit plný text záznamu

Report

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Autor: Yang, Jian, Ma, Shuming, Huang, Haoyang, Zhang, Dongdong, Dong, Li, Huang, Shaohan, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu

This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is

Externí odkaz: http://arxiv.org/abs/2111.02086

Zobrazit plný text záznamu

Report

Scalable and Efficient MoE Training for Multitask Multilingual Models

Autor: Kim, Young Jin, Awan, Ammar Ahmad, Muzio, Alexandre, Salinas, Andres Felipe Cruz, Lu, Liyang, Hendy, Amr, Rajbhandari, Samyam, He, Yuxiong, Awadalla, Hany Hassan

The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunitie

Externí odkaz: http://arxiv.org/abs/2109.10465

Zobrazit plný text záznamu

Report

Improving Multilingual Translation by Representation and Gradient Regularization

Autor: Yang, Yilin, Eriguchi, Akiko, Muzio, Alexandre, Tadepalli, Prasad, Lee, Stefan, Hassan, Hany

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low

Externí odkaz: http://arxiv.org/abs/2109.04778

Zobrazit plný text záznamu

Report

Discovering Representation Sprachbund For Multilingual Pre-Training

Autor: Fan, Yimin, Liang, Yaobo, Muzio, Alexandre, Hassan, Hany, Li, Houqiang, Zhou, Ming, Duan, Nan

Multilingual pre-trained models have demonstrated their effectiveness in many multilingual NLP tasks and enabled zero-shot or few-shot transfer from high-resource languages to low resource ones. However, due to significant typological differences and

Externí odkaz: http://arxiv.org/abs/2109.00271

Zobrazit plný text záznamu

Report

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Autor: Ma, Shuming, Dong, Li, Huang, Shaohan, Zhang, Dongdong, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework,

Externí odkaz: http://arxiv.org/abs/2106.13736

Zobrazit plný text záznamu

Report

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Autor: Ma, Shuming, Yang, Jian, Huang, Haoyang, Chi, Zewen, Dong, Li, Zhang, Dongdong, Awadalla, Hany Hassan, Muzio, Alexandre, Eriguchi, Akiko, Singhal, Saksham, Song, Xia, Menezes, Arul, Wei, Furu

Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success o

Externí odkaz: http://arxiv.org/abs/2012.15547

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání