Zobrazeno 1 - 10
of 2 607
pro vyhledávání: '"TANAKA, MASAHIRO"'
Autor:
Liu, Liyuan, Kim, Young Jin, Wang, Shuohang, Liang, Chen, Shen, Yelong, Cheng, Hao, Liu, Xiaodong, Tanaka, Masahiro, Wu, Xiaoxia, Hu, Wenxiang, Chaudhary, Vishrav, Lin, Zeqi, Zhang, Chenruidong, Xue, Jilong, Awadalla, Hany, Gao, Jianfeng, Chen, Weizhu
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training pract
Externí odkaz:
http://arxiv.org/abs/2409.12136
Autor:
Yao, Jinghan, Jacobs, Sam Ade, Tanaka, Masahiro, Ruwase, Olatunji, Shafi, Aamir, Subramoni, Hari, Panda, Dhabaleswar K.
Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely lo
Externí odkaz:
http://arxiv.org/abs/2408.16978
Autor:
LLM-jp, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, Yoshino, Koichiro
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants
Externí odkaz:
http://arxiv.org/abs/2407.03963
Autor:
Lian, Xinyu, Jacobs, Sam Ade, Kurilenko, Lev, Tanaka, Masahiro, Bekman, Stas, Ruwase, Olatunji, Zhang, Minjia
Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed mo
Externí odkaz:
http://arxiv.org/abs/2406.18820
Autor:
Tanaka, Masahiro
In the realm of statistical learning, the increasing volume of accessible data and increasing model complexity necessitate robust methodologies. This paper explores two branches of robust Bayesian methods in response to this trend. The first is gener
Externí odkaz:
http://arxiv.org/abs/2405.04845
Autor:
Tanaka, Masahiro
As the amount and complexity of available data increases, the need for robust statistical learning becomes more pressing. To enhance resilience against model misspecification, the generalized posterior inference method adjusts the likelihood term by
Externí odkaz:
http://arxiv.org/abs/2404.16528
Autor:
Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi
Externí odkaz:
http://arxiv.org/abs/2404.14219
Autor:
Tanaka, Masahiro
Orthogonal matrices play an important role in probability and statistics, particularly in high-dimensional statistical models. Parameterizing these models using orthogonal matrices facilitates dimension reduction and parameter identification. However
Externí odkaz:
http://arxiv.org/abs/2402.07434
Autor:
Song, Shuaiwen Leon, Kruft, Bonnie, Zhang, Minjia, Li, Conglong, Chen, Shiyang, Zhang, Chengming, Tanaka, Masahiro, Wu, Xiaoxia, Rasley, Jeff, Awan, Ammar Ahmad, Holmes, Connor, Cai, Martin, Ghanem, Adam, Zhou, Zhongzhu, He, Yuxiong, Luferenko, Pete, Kumar, Divya, Weyn, Jonathan, Zhang, Ruixiong, Klocek, Sylwester, Vragov, Volodymyr, AlQuraishi, Mohammed, Ahdritz, Gustaf, Floristean, Christina, Negri, Cristina, Kotamarthi, Rao, Vishwanath, Venkatram, Ramanathan, Arvind, Foreman, Sam, Hippe, Kyle, Arcomano, Troy, Maulik, Romit, Zvyagin, Maxim, Brace, Alexander, Zhang, Bin, Bohorquez, Cindy Orozco, Clyde, Austin, Kale, Bharat, Perez-Rivera, Danilo, Ma, Heng, Mann, Carla M., Irvin, Michael, Pauloski, J. Gregory, Ward, Logan, Hayot, Valerie, Emani, Murali, Xie, Zhen, Lin, Diangen, Shukla, Maulik, Foster, Ian, Davis, James J., Papka, Michael E., Brettin, Thomas, Balaprakash, Prasanna, Tourassi, Gina, Gounley, John, Hanson, Heidi, Potok, Thomas E, Pasini, Massimiliano Lupo, Evans, Kate, Lu, Dan, Lunga, Dalton, Yin, Junqi, Dash, Sajal, Wang, Feiyi, Shankar, Mallikarjun, Lyngaas, Isaac, Wang, Xiao, Cong, Guojing, Zhang, Pei, Fan, Ming, Liu, Siyan, Hoisie, Adolfy, Yoo, Shinjae, Ren, Yihui, Tang, William, Felker, Kyle, Svyatkovskiy, Alexey, Liu, Hang, Aji, Ashwin, Dalton, Angela, Schulte, Michael, Schulz, Karl, Deng, Yuntian, Nie, Weili, Romero, Josh, Dallago, Christian, Vahdat, Arash, Xiao, Chaowei, Gibbs, Thomas, Anandkumar, Anima, Stevens, Rick
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors fro
Externí odkaz:
http://arxiv.org/abs/2310.04610
Autor:
Jacobs, Sam Ade, Tanaka, Masahiro, Zhang, Chengming, Zhang, Minjia, Song, Shuaiwen Leon, Rajbhandari, Samyam, He, Yuxiong
Computation in a typical Transformer-based large language model (LLM) can be characterized by batch size, hidden dimension, number of layers, and sequence length. Until now, system works for accelerating LLM training have focused on the first three d
Externí odkaz:
http://arxiv.org/abs/2309.14509