Zobrazeno 1 - 10
of 2 474
pro vyhledávání: '"Bubeck, A."'
We study the efficacy of Small Language Models (SLMs) in facilitating application usage through natural language interactions. Our focus here is on a particular internal application used in Microsoft for cloud supply chain fulfilment. Our experiments
Externí odkaz:
http://arxiv.org/abs/2405.20347
Autor:
Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi
Externí odkaz:
http://arxiv.org/abs/2404.14219
Autor:
Liu, Bingbin, Bubeck, Sebastien, Eldan, Ronen, Kulkarni, Janardhan, Li, Yuanzhi, Nguyen, Anh, Ward, Rachel, Zhang, Yi
Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break t
Externí odkaz:
http://arxiv.org/abs/2312.09241
Autor:
Kelly L. Tomaszewski, Meagan Blanchard, Reuben Olaniyi, Hannah R. Brenton, Samantha Hayes, Farheen Fatma, Gaya K. Amarasinghe, Byoung-Kyu Cho, Young Ah Goo, Andrea C. DeDent, Stephanie A. Fritz, Juliane Bubeck Wardenburg
Publikováno v:
Nature Communications, Vol 15, Iss 1, Pp 1-14 (2024)
Abstract Staphylococcus aureus remains a leading global cause of bacterial infection-associated mortality and has eluded prior vaccine development efforts. S. aureus α-toxin (Hla) is an essential virulence factor in disease, impairing the T cell res
Externí odkaz:
https://doaj.org/article/707c2b554c8140f281f1b5f75bcc7e3e
Autor:
Li, Yuanzhi, Bubeck, Sébastien, Eldan, Ronen, Del Giorno, Allie, Gunasekar, Suriya, Lee, Yin Tat
We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billio
Externí odkaz:
http://arxiv.org/abs/2309.05463
Autor:
Gunasekar, Suriya, Zhang, Yi, Aneja, Jyoti, Mendes, Caio César Teodoro, Del Giorno, Allie, Gopi, Sivakanth, Javaheripi, Mojan, Kauffmann, Piero, de Rosa, Gustavo, Saarikivi, Olli, Salim, Adil, Shah, Shital, Behl, Harkirat Singh, Wang, Xin, Bubeck, Sébastien, Eldan, Ronen, Kalai, Adam Tauman, Lee, Yin Tat, Li, Yuanzhi
We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from
Externí odkaz:
http://arxiv.org/abs/2306.11644
Publikováno v:
Natural Hazards and Earth System Sciences, Vol 24, Pp 2837-2856 (2024)
The devastating floods that swept through the Ahr valley in July 2021 left indelible marks on the region's landscape and communities. Beyond the visible damage, experience from other events suggests an increase in mental health issues among those aff
Externí odkaz:
https://doaj.org/article/911729863d6847c5b259f007fc724826
Autor:
Bubeck, Sébastien, Chandrasekaran, Varun, Eldan, Ronen, Gehrke, Johannes, Horvitz, Eric, Kamar, Ece, Lee, Peter, Lee, Yin Tat, Li, Yuanzhi, Lundberg, Scott, Nori, Harsha, Palangi, Hamid, Ribeiro, Marco Tulio, Zhang, Yi
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest mo
Externí odkaz:
http://arxiv.org/abs/2303.12712
Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), w
Externí odkaz:
http://arxiv.org/abs/2212.07469
SGD and AdamW are the two most used optimizers for fine-tuning large neural networks in computer vision. When the two methods perform the same, SGD is preferable because it uses less memory (12 bytes/parameter with momentum and 8 bytes/parameter with
Externí odkaz:
http://arxiv.org/abs/2211.09359