Výsledky vyhledávání - "Awadalla, Hany"

Report

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Autor: Abdin, Marah, Jacobs, Sam Ade, Awan, Ammar Ahmad, Aneja, Jyoti, Awadallah, Ahmed, Awadalla, Hany, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Qin, Cai, Martin, Mendes, Caio César Teodoro, Chen, Weizhu, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Yen-Chun, Chen, Yi-Ling, Chopra, Parul, Dai, Xiyang, Del Giorno, Allie, de Rosa, Gustavo, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Iter, Dan, Gao, Mei, Gao, Min, Gao, Jianfeng, Garg, Amit, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Huynh, Jamie, Javaheripi, Mojan, Jin, Xin, Kauffmann, Piero, Karampatziakis, Nikos, Kim, Dongwoo, Khademi, Mahoud, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Liu, Ce, Liu, Mengchen, Liu, Weishung, Lin, Eric, Lin, Zeqi, Luo, Chong, Madan, Piyush, Mazzola, Matt, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Wang, Xin, Wang, Lijuan, Wang, Chunyu, Wang, Yu, Ward, Rachel, Wang, Guanhua, Witte, Philipp, Wu, Haiping, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Ziyi, Yang, Yifan, Yu, Donghan, Yuan, Lu, Zhang, Chengruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi

Externí odkaz: http://arxiv.org/abs/2404.14219

Zobrazit plný text záznamu

Report

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges

Externí odkaz: http://arxiv.org/abs/2403.08002

Zobrazit plný text záznamu

Report

SciAgent: Tool-augmented Language Models for Scientific Reasoning

Autor: Ma, Yubo, Gou, Zhibin, Hao, Junheng, Xu, Ruochen, Wang, Shuohang, Pan, Liangming, Yang, Yujiu, Cao, Yixin, Sun, Aixin, Awadalla, Hany, Chen, Weizhu

Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting

Externí odkaz: http://arxiv.org/abs/2402.11451

Zobrazit plný text záznamu

Report

Dissecting In-Context Learning of Translations in GPTs

Autor: Raunak, Vikas, Awadalla, Hany Hassan, Menezes, Arul

Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes

Externí odkaz: http://arxiv.org/abs/2310.15987

Zobrazit plný text záznamu

Report

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

Autor: Kim, Young Jin, Fahim, Raffy, Awadalla, Hany Hassan

Large Mixture of Experts (MoE) models could achieve state-of-the-art quality on various language tasks, including machine translation task, thanks to the efficient model scaling capability with expert parallelism. However, it has brought a fundamenta

Externí odkaz: http://arxiv.org/abs/2310.02410

Zobrazit plný text záznamu

Report

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Autor: Xu, Haoran, Kim, Young Jin, Sharaf, Amr, Awadalla, Hany Hassan

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), whic

Externí odkaz: http://arxiv.org/abs/2309.11674

Zobrazit plný text záznamu

Report

Task-Based MoE for Multitask Multilingual Machine Translation

Autor: Pham, Hai, Kim, Young Jin, Mukherjee, Subhabrata, Woodruff, David P., Poczos, Barnabas, Awadalla, Hany Hassan

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manne

Externí odkaz: http://arxiv.org/abs/2308.15772

Zobrazit plný text záznamu

Report

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

Autor: Kim, Young Jin, Henry, Rawn, Fahim, Raffy, Awadalla, Hany Hassan

Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from hig

Externí odkaz: http://arxiv.org/abs/2308.09723

Zobrazit plný text záznamu

Report

Do GPTs Produce Less Literal Translations?

Autor: Raunak, Vikas, Menezes, Arul, Post, Matt, Awadalla, Hany Hassan

Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-sh

Externí odkaz: http://arxiv.org/abs/2305.16806

Zobrazit plný text záznamu

Report

ResiDual: Transformer with Dual Residual Connections

Autor: Xie, Shufang, Zhang, Huishuai, Guo, Junliang, Tan, Xu, Bian, Jiang, Awadalla, Hany Hassan, Menezes, Arul, Qin, Tao, Yan, Rui

Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still deba

Externí odkaz: http://arxiv.org/abs/2304.14802

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání