Zobrazeno 1 - 10
of 30 181
pro vyhledávání: '"Young Jin, So"'
Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemant
Externí odkaz:
http://arxiv.org/abs/2412.04139
DEtection TRansformer (DETR) has emerged as a promising architecture for object detection, offering an end-to-end prediction pipeline. In practice, however, DETR generates hundreds of predictions that far outnumber the actual number of objects presen
Externí odkaz:
http://arxiv.org/abs/2412.01782
The integration of Distributed Energy Resources (DERs) into power distribution systems has made microgrids foundational to grid modernization. These DERs, connected through power electronic inverters, create power electronics dominated grid architect
Externí odkaz:
http://arxiv.org/abs/2411.08761
Full Waveform Inversion (FWI) is a vital technique for reconstructing high-resolution subsurface velocity maps from seismic waveform data, governed by partial differential equations (PDEs) that model wave propagation. Traditional machine learning app
Externí odkaz:
http://arxiv.org/abs/2410.09002
Digital pathology has made significant advances in tumor diagnosis and segmentation, but image variability due to differences in organs, tissue preparation, and acquisition - known as domain shift - limits the effectiveness of current algorithms. The
Externí odkaz:
http://arxiv.org/abs/2409.13246
Autor:
Liu, Liyuan, Kim, Young Jin, Wang, Shuohang, Liang, Chen, Shen, Yelong, Cheng, Hao, Liu, Xiaodong, Tanaka, Masahiro, Wu, Xiaoxia, Hu, Wenxiang, Chaudhary, Vishrav, Lin, Zeqi, Zhang, Chenruidong, Xue, Jilong, Awadalla, Hany, Gao, Jianfeng, Chen, Weizhu
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules. However, sparse computation challenges traditional training pract
Externí odkaz:
http://arxiv.org/abs/2409.12136
Publikováno v:
International Journal of Geometric Methods in Modern Physics, 2024
This article delivers the characterization of a pseudo $B$ symmetric spacetimes and we illustrate that a pseudo $B$ symmetric spacetime admitting Codazzi type of $B$-tensor represents a perfect fluid spacetime and if this spacetime admits the time-li
Externí odkaz:
http://arxiv.org/abs/2408.16976
Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that r
Externí odkaz:
http://arxiv.org/abs/2406.12233
On existence of Sadovskii vortex patch: A touching pair of symmetric counter-rotating uniform vortex
The Sadovskii vortex patch is a traveling wave for the two-dimensional incompressible Euler equations consisting of an odd symmetric pair of vortex patches touching the symmetry axis. Its existence was first suggested by numerical computations of Sad
Externí odkaz:
http://arxiv.org/abs/2406.11379
Autor:
Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi
Externí odkaz:
http://arxiv.org/abs/2404.14219