Zobrazeno 1 - 10
of 36
pro vyhledávání: '"Awadalla, Hany Hassan"'
Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes
Externí odkaz:
http://arxiv.org/abs/2310.15987
Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamenta
Externí odkaz:
http://arxiv.org/abs/2310.02410
Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), whic
Externí odkaz:
http://arxiv.org/abs/2309.11674
Autor:
Pham, Hai, Kim, Young Jin, Mukherjee, Subhabrata, Woodruff, David P., Poczos, Barnabas, Awadalla, Hany Hassan
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manne
Externí odkaz:
http://arxiv.org/abs/2308.15772
Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from hig
Externí odkaz:
http://arxiv.org/abs/2308.09723
Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-sh
Externí odkaz:
http://arxiv.org/abs/2305.16806
Autor:
Xie, Shufang, Zhang, Huishuai, Guo, Junliang, Tan, Xu, Bian, Jiang, Awadalla, Hany Hassan, Menezes, Arul, Qin, Tao, Yan, Rui
Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still deba
Externí odkaz:
http://arxiv.org/abs/2304.14802
Autor:
Hendy, Amr, Abdelrehim, Mohamed, Sharaf, Amr, Raunak, Vikas, Gabr, Mohamed, Matsushita, Hitokazu, Kim, Young Jin, Afify, Mohamed, Awadalla, Hany Hassan
Generative Pre-trained Transformer (GPT) models have shown remarkable capabilities for natural language generation, but their performance for machine translation has not been thoroughly investigated. In this paper, we present a comprehensive evaluati
Externí odkaz:
http://arxiv.org/abs/2302.09210
Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters. As a result, these models have achieved significantly better quality on various natural lang
Externí odkaz:
http://arxiv.org/abs/2211.10017
Autor:
He, Pengcheng, Peng, Baolin, Lu, Liyang, Wang, Song, Mei, Jie, Liu, Yang, Xu, Ruochen, Awadalla, Hany Hassan, Shi, Yu, Zhu, Chenguang, Xiong, Wayne, Zeng, Michael, Gao, Jianfeng, Huang, Xuedong
This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improv
Externí odkaz:
http://arxiv.org/abs/2208.09770