Výsledky vyhledávání

Report

Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers

Autor: Wen, Yuxin, Cao, Qingqing, Fu, Qichen, Mehta, Sachin, Najibi, Mahyar

Recent advancements in vision-language models (VLMs) have expanded their potential for real-world applications, enabling these models to perform complex reasoning on images. In the widely used fully autoregressive transformer-based models like LLaVA,

Externí odkaz: http://arxiv.org/abs/2410.14072

Zobrazit plný text záznamu

Report

Apple Intelligence Foundation Language Models

Autor: Gunter, Tom, Wang, Zirui, Wang, Chong, Pang, Ruoming, Narayanan, Andy, Zhang, Aonan, Zhang, Bowen, Chen, Chen, Chiu, Chung-Cheng, Qiu, David, Gopinath, Deepak, Yap, Dian Ang, Yin, Dong, Nan, Feng, Weers, Floris, Yin, Guoli, Huang, Haoshuo, Wang, Jianyu, Lu, Jiarui, Peebles, John, Ye, Ke, Lee, Mark, Du, Nan, Chen, Qibin, Keunebroek, Quentin, Wiseman, Sam, Evans, Syd, Lei, Tao, Rathod, Vivek, Kong, Xiang, Du, Xianzhi, Li, Yanghao, Wang, Yongqiang, Gao, Yuan, Ahmed, Zaid, Xu, Zhaoyang, Lu, Zhiyun, Rashid, Al, Jose, Albin Madappally, Doane, Alec, Bencomo, Alfredo, Vanderby, Allison, Hansen, Andrew, Jain, Ankur, Anupama, Anupama Mann, Kamal, Areeba, Wu, Bugu, Brum, Carolina, Maalouf, Charlie, Erdenebileg, Chinguun, Dulhanty, Chris, Moritz, Dominik, Kang, Doug, Jimenez, Eduardo, Ladd, Evan, Shi, Fangping, Bai, Felix, Chu, Frank, Hohman, Fred, Kotek, Hadas, Coleman, Hannah Gillis, Li, Jane, Bigham, Jeffrey, Cao, Jeffery, Lai, Jeff, Cheung, Jessica, Shan, Jiulong, Zhou, Joe, Li, John, Qin, Jun, Singh, Karanjeet, Vega, Karla, Zou, Kelvin, Heckman, Laura, Gardiner, Lauren, Bowler, Margit, Cordell, Maria, Cao, Meng, Hay, Nicole, Shahdadpuri, Nilesh, Godwin, Otto, Dighe, Pranay, Rachapudi, Pushyami, Tantawi, Ramsey, Frigg, Roman, Davarnia, Sam, Shah, Sanskruti, Guha, Saptarshi, Sirovica, Sasha, Ma, Shen, Ma, Shuang, Wang, Simon, Kim, Sulgi, Jayaram, Suma, Shankar, Vaishaal, Paidi, Varsha, Kumar, Vivek, Wang, Xin, Zheng, Xin, Cheng, Walker, Shrager, Yael, Ye, Yang, Tanaka, Yasu, Guo, Yihao, Meng, Yunsong, Luo, Zhao Tang, Ouyang, Zhi, Aygar, Alp, Wan, Alvin, Walkingshaw, Andrew, Lin, Antonie, Farooq, Arsalan, Ramerth, Brent, Reed, Colorado, Bartels, Chris, Chaney, Chris, Riazati, David, Yang, Eric Liang, Feldman, Erin, Hochstrasser, Gabriel, Seguin, Guillaume, Belousova, Irina, Pelemans, Joris, Yang, Karen, Vahid, Keivan Alizadeh, Cao, Liangliang, Najibi, Mahyar, Zuliani, Marco, Horton, Max, Cho, Minsik, Bhendawade, Nikhil, Dong, Patrick, Maj, Piotr, Agrawal, Pulkit, Shan, Qi, Fu, Qichen, Poston, Regan, Xu, Sam, Liu, Shuangning, Rao, Sushma, Heeramun, Tashweena, Merth, Thomas, Rayala, Uday, Cui, Victor, Sridhar, Vivek Rangarajan, Zhang, Wencong, Zhang, Wenqi, Wu, Wentao, Zhou, Xingyu, Liu, Xinwen, Zhao, Yang, Xia, Yin, Ren, Zhile, Ren, Zhongzheng

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These mode

Externí odkaz: http://arxiv.org/abs/2407.21075

Zobrazit plný text záznamu

Report

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

Autor: Fu, Qichen, Cho, Minsik, Merth, Thomas, Mehta, Sachin, Rastegari, Mohammad, Najibi, Mahyar

The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts

Externí odkaz: http://arxiv.org/abs/2407.14057

Zobrazit plný text záznamu

Report

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Autor: Merth, Thomas, Fu, Qichen, Rastegari, Mohammad, Najibi, Mahyar

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in so

Externí odkaz: http://arxiv.org/abs/2404.06910

Zobrazit plný text záznamu

Report

Speculative Streaming: Fast LLM Inference without Auxiliary Models

Autor: Bhendawade, Nikhil, Belousova, Irina, Fu, Qichen, Mason, Henry, Rastegari, Mohammad, Najibi, Mahyar

Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and

Externí odkaz: http://arxiv.org/abs/2402.11131

Zobrazit plný text záznamu

Report

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline

Autor: Lin, Chien-Yu, Fu, Qichen, Merth, Thomas, Yang, Karren, Ranjan, Anurag

Super-resolution (SR) techniques have recently been proposed to upscale the outputs of neural radiance fields (NeRF) and generate high-quality images with enhanced inference speeds. However, existing NeRF+SR methods increase training overhead by usin

Externí odkaz: http://arxiv.org/abs/2312.11537

Zobrazit plný text záznamu

Report

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

Autor: Cho, Minsik, Vahid, Keivan A., Fu, Qichen, Adya, Saurabh, Del Mundo, Carlo C, Rastegari, Mohammad, Naik, Devang, Zatloukal, Peter

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of

Externí odkaz: http://arxiv.org/abs/2309.00964

Zobrazit plný text záznamu

Report

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

Autor: Fu, Qichen, Liu, Xingyu, Xu, Ran, Niebles, Juan Carlos, Kitani, Kris M.

Accurately estimating 3D hand pose is crucial for understanding how humans interact with the world. Despite remarkable progress, existing methods often struggle to generate plausible hand poses when the hand is heavily occluded or blurred. In videos,

Externí odkaz: http://arxiv.org/abs/2303.04991

Zobrazit plný text záznamu

Report

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

Autor: Ohkawa, Takehiko, Li, Yu-Jhe, Fu, Qichen, Furuta, Ryosuke, Kitani, Kris M., Sato, Yoichi

We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e.g., outdoors) when we only have labeled images taken under very different conditions (e.g., indoors). In the real wor

Externí odkaz: http://arxiv.org/abs/2203.08344

Zobrazit plný text záznamu

Report

Sequential Voting with Relational Box Fields for Active Object Detection

Autor: Fu, Qichen, Liu, Xingyu, Kitani, Kris M.

A key component of understanding hand-object interactions is the ability to identify the active object -- the object that is being manipulated by the human hand. In order to accurately localize the active object, any method must reason using informat

Externí odkaz: http://arxiv.org/abs/2110.11524

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání