Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Singhal, Saksham"'
Autor:
Huang, Shaohan, Dong, Li, Wang, Wenhui, Hao, Yaru, Singhal, Saksham, Ma, Shuming, Lv, Tengchao, Cui, Lei, Mohammed, Owais Khan, Patra, Barun, Liu, Qiang, Aggarwal, Kriti, Chi, Zewen, Bjorck, Johan, Chaudhary, Vishrav, Som, Subhojit, Song, Xia, Wei, Furu
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities,
Externí odkaz:
http://arxiv.org/abs/2302.14045
Autor:
Patra, Barun, Singhal, Saksham, Huang, Shaohan, Chi, Zewen, Dong, Li, Wei, Furu, Chaudhary, Vishrav, Song, Xia
In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrai
Externí odkaz:
http://arxiv.org/abs/2210.14867
Autor:
Wang, Hongyu, Ma, Shuming, Huang, Shaohan, Dong, Li, Wang, Wenhui, Peng, Zhiliang, Wu, Yu, Bajaj, Payal, Singhal, Saksham, Benhaim, Alon, Patra, Barun, Liu, Zhun, Chaudhary, Vishrav, Song, Xia, Wei, Furu
A big convergence of model architectures across language, vision, speech, and multimodal is emerging. However, under the same name "Transformers", the above areas use different implementations for better performance, e.g., Post-LayerNorm for BERT, an
Externí odkaz:
http://arxiv.org/abs/2210.06423
Autor:
Wang, Wenhui, Bao, Hangbo, Dong, Li, Bjorck, Johan, Peng, Zhiliang, Liu, Qiang, Aggarwal, Kriti, Mohammed, Owais Khan, Singhal, Saksham, Som, Subhojit, Wei, Furu
A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language ta
Externí odkaz:
http://arxiv.org/abs/2208.10442
Autor:
Chi, Zewen, Dong, Li, Huang, Shaohan, Dai, Damai, Ma, Shuming, Patra, Barun, Singhal, Saksham, Bajaj, Payal, Song, Xia, Mao, Xian-Ling, Huang, Heyan, Wei, Furu
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, le
Externí odkaz:
http://arxiv.org/abs/2204.09179
Autor:
Yang, Jian, Ma, Shuming, Huang, Haoyang, Zhang, Dongdong, Dong, Li, Huang, Shaohan, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is
Externí odkaz:
http://arxiv.org/abs/2111.02086
Autor:
Zheng, Bo, Dong, Li, Huang, Shaohan, Singhal, Saksham, Che, Wanxiang, Liu, Ting, Song, Xia, Wei, Furu
Compared to monolingual models, cross-lingual models usually require a more expressive vocabulary to represent all languages adequately. We find that many languages are under-represented in recent cross-lingual language models due to the limited voca
Externí odkaz:
http://arxiv.org/abs/2109.07306
Autor:
Chi, Zewen, Huang, Shaohan, Dong, Li, Ma, Shuming, Zheng, Bo, Singhal, Saksham, Bajaj, Payal, Song, Xia, Mao, Xian-Ling, Huang, Heyan, Wei, Furu
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training. Specifically, we present two pre-training tasks, namely multilingual replaced token detection, and translation replaced token detection. Besides, we pretrai
Externí odkaz:
http://arxiv.org/abs/2106.16138
Autor:
Ma, Shuming, Dong, Li, Huang, Shaohan, Zhang, Dongdong, Muzio, Alexandre, Singhal, Saksham, Awadalla, Hany Hassan, Song, Xia, Wei, Furu
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG). NLG tasks are often based on the encoder-decoder framework,
Externí odkaz:
http://arxiv.org/abs/2106.13736
Autor:
Zheng, Bo, Dong, Li, Huang, Shaohan, Wang, Wenhui, Chi, Zewen, Singhal, Saksham, Che, Wanxiang, Liu, Ting, Song, Xia, Wei, Furu
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example
Externí odkaz:
http://arxiv.org/abs/2106.08226