Výsledky vyhledávání - "Pang, Ruoming"

Report

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Autor: Sun, Haotian, Lei, Tao, Zhang, Bowen, Li, Yanghao, Huang, Haoshuo, Pang, Ruoming, Dai, Bo, Du, Nan

Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly

Externí odkaz: http://arxiv.org/abs/2410.02098

Zobrazit plný text záznamu

Report

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Autor: Feng, Shengyu, Kong, Xiang, Ma, Shuang, Zhang, Aonan, Yin, Dong, Wang, Chong, Pang, Ruoming, Yang, Yiming

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification a

Externí odkaz: http://arxiv.org/abs/2410.01920

Zobrazit plný text záznamu

Report

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Autor: Lu, Jiarui, Holleis, Thomas, Zhang, Yizhe, Aumayer, Bernhard, Nan, Feng, Bai, Felix, Ma, Shuang, Ma, Shen, Li, Mengyu, Yin, Guoli, Wang, Zirui, Pang, Ruoming

Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evalua

Externí odkaz: http://arxiv.org/abs/2408.04682

Zobrazit plný text záznamu

Report

Apple Intelligence Foundation Language Models

Autor: Gunter, Tom, Wang, Zirui, Wang, Chong, Pang, Ruoming, Narayanan, Andy, Zhang, Aonan, Zhang, Bowen, Chen, Chen, Chiu, Chung-Cheng, Qiu, David, Gopinath, Deepak, Yap, Dian Ang, Yin, Dong, Nan, Feng, Weers, Floris, Yin, Guoli, Huang, Haoshuo, Wang, Jianyu, Lu, Jiarui, Peebles, John, Ye, Ke, Lee, Mark, Du, Nan, Chen, Qibin, Keunebroek, Quentin, Wiseman, Sam, Evans, Syd, Lei, Tao, Rathod, Vivek, Kong, Xiang, Du, Xianzhi, Li, Yanghao, Wang, Yongqiang, Gao, Yuan, Ahmed, Zaid, Xu, Zhaoyang, Lu, Zhiyun, Rashid, Al, Jose, Albin Madappally, Doane, Alec, Bencomo, Alfredo, Vanderby, Allison, Hansen, Andrew, Jain, Ankur, Anupama, Anupama Mann, Kamal, Areeba, Wu, Bugu, Brum, Carolina, Maalouf, Charlie, Erdenebileg, Chinguun, Dulhanty, Chris, Moritz, Dominik, Kang, Doug, Jimenez, Eduardo, Ladd, Evan, Shi, Fangping, Bai, Felix, Chu, Frank, Hohman, Fred, Kotek, Hadas, Coleman, Hannah Gillis, Li, Jane, Bigham, Jeffrey, Cao, Jeffery, Lai, Jeff, Cheung, Jessica, Shan, Jiulong, Zhou, Joe, Li, John, Qin, Jun, Singh, Karanjeet, Vega, Karla, Zou, Kelvin, Heckman, Laura, Gardiner, Lauren, Bowler, Margit, Cordell, Maria, Cao, Meng, Hay, Nicole, Shahdadpuri, Nilesh, Godwin, Otto, Dighe, Pranay, Rachapudi, Pushyami, Tantawi, Ramsey, Frigg, Roman, Davarnia, Sam, Shah, Sanskruti, Guha, Saptarshi, Sirovica, Sasha, Ma, Shen, Ma, Shuang, Wang, Simon, Kim, Sulgi, Jayaram, Suma, Shankar, Vaishaal, Paidi, Varsha, Kumar, Vivek, Wang, Xin, Zheng, Xin, Cheng, Walker, Shrager, Yael, Ye, Yang, Tanaka, Yasu, Guo, Yihao, Meng, Yunsong, Luo, Zhao Tang, Ouyang, Zhi, Aygar, Alp, Wan, Alvin, Walkingshaw, Andrew, Lin, Antonie, Farooq, Arsalan, Ramerth, Brent, Reed, Colorado, Bartels, Chris, Chaney, Chris, Riazati, David, Yang, Eric Liang, Feldman, Erin, Hochstrasser, Gabriel, Seguin, Guillaume, Belousova, Irina, Pelemans, Joris, Yang, Karen, Vahid, Keivan Alizadeh, Cao, Liangliang, Najibi, Mahyar, Zuliani, Marco, Horton, Max, Cho, Minsik, Bhendawade, Nikhil, Dong, Patrick, Maj, Piotr, Agrawal, Pulkit, Shan, Qi, Fu, Qichen, Poston, Regan, Xu, Sam, Liu, Shuangning, Rao, Sushma, Heeramun, Tashweena, Merth, Thomas, Rayala, Uday, Cui, Victor, Sridhar, Vivek Rangarajan, Zhang, Wencong, Zhang, Wenqi, Wu, Wentao, Zhou, Xingyu, Liu, Xinwen, Zhao, Yang, Xia, Yin, Ren, Zhile, Ren, Zhongzheng

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These mode

Externí odkaz: http://arxiv.org/abs/2407.21075

Zobrazit plný text záznamu

Report

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing

Externí odkaz: http://arxiv.org/abs/2407.18961

Zobrazit plný text záznamu

Report

Large Language Model-guided Document Selection

Autor: Kong, Xiang, Gunter, Tom, Pang, Ruoming

Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. Inspired by efforts suggesting t

Externí odkaz: http://arxiv.org/abs/2406.04638

Zobrazit plný text záznamu

Report

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training

Autor: Du, Xianzhi, Gunter, Tom, Kong, Xiang, Lee, Mark, Wang, Zirui, Zhang, Aonan, Du, Nan, Pang, Ruoming

Mixture-of-Experts (MoE) enjoys performance gain by increasing model capacity while keeping computation cost constant. When comparing MoE to dense models, prior work typically adopt the following setting: 1) use FLOPs or activated parameters as a mea

Externí odkaz: http://arxiv.org/abs/2405.15052

Zobrazit plný text záznamu

Report

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the v

Externí odkaz: http://arxiv.org/abs/2403.09611

Zobrazit plný text záznamu

Report

Instruction-Following Speech Recognition

Autor: Lai, Cheng-I Jeff, Lu, Zhiyun, Cao, Liangliang, Pang, Ruoming

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions. With the advent of Large Language Models (LLMs) in speech processing, more organic, tex

Externí odkaz: http://arxiv.org/abs/2309.09843

Zobrazit plný text záznamu

Report

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

Autor: Daxberger, Erik, Weers, Floris, Zhang, Bowen, Gunter, Tom, Pang, Ruoming, Eichner, Marcin, Emmersberger, Michael, Yang, Yinfei, Toshev, Alexander, Du, Xianzhi

Sparse Mixture-of-Experts models (MoEs) have recently gained popularity due to their ability to decouple model size from inference efficiency by only activating a small subset of the model parameters for any given input token. As such, sparse MoEs ha

Externí odkaz: http://arxiv.org/abs/2309.04354

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání