Výsledky vyhledávání - "Krishna, Ranjay"

Report

The Hard Positive Truth about Vision-Language Compositionality

Autor: Kamath, Amita, Hsieh, Cheng-Yu, Chang, Kai-Wei, Krishna, Ranjay

Several benchmarks have concluded that our best vision-language models (e.g., CLIP) are lacking in compositionality. Given an image, these benchmarks probe a model's ability to identify its associated caption amongst a set of compositional distractor

Externí odkaz: http://arxiv.org/abs/2409.17958

Zobrazit plný text záznamu

Report

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Autor: Deitke, Matt, Clark, Christopher, Lee, Sangho, Tripathi, Rohun, Yang, Yue, Park, Jae Sung, Salehi, Mohammadreza, Muennighoff, Niklas, Lo, Kyle, Soldaini, Luca, Lu, Jiasen, Anderson, Taira, Bransom, Erin, Ehsani, Kiana, Ngo, Huong, Chen, YenSung, Patel, Ajay, Yatskar, Mark, Callison-Burch, Chris, Head, Andrew, Hendrix, Rose, Bastani, Favyen, VanderBilt, Eli, Lambert, Nathan, Chou, Yvonne, Chheda, Arnavi, Sparks, Jenna, Skjonsberg, Sam, Schmitz, Michael, Sarnat, Aaron, Bischoff, Byron, Walsh, Pete, Newell, Chris, Wolters, Piper, Gupta, Tanmay, Zeng, Kuo-Hao, Borchardt, Jon, Groeneveld, Dirk, Dumas, Jen, Nam, Crystal, Lebrecht, Sophie, Wittlif, Caitlin, Schoenick, Carissa, Michel, Oscar, Krishna, Ranjay, Weihs, Luca, Smith, Noah A., Hajishirzi, Hannaneh, Girshick, Ross, Farhadi, Ali, Kembhavi, Aniruddha

Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones. As a result, the

Externí odkaz: http://arxiv.org/abs/2409.17146

Zobrazit plný text záznamu

Report

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]

Autor: Zhang, Enhao, Sullivan, Nicole, Haynes, Brandon, Krishna, Ranjay, Balazinska, Magdalena

Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that s

Externí odkaz: http://arxiv.org/abs/2408.02243

Zobrazit plný text záznamu

Report

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Autor: Liu, Benlin, Dong, Yuhao, Wang, Yiqin, Rao, Yongming, Tang, Yansong, Ma, Wei-Chiu, Krishna, Ranjay

Multimodal language models (MLLMs) are increasingly being implemented in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Despite their potential, current top models within our community st

Externí odkaz: http://arxiv.org/abs/2408.00754

Zobrazit plný text záznamu

Report

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Autor: Liu, Zuyan, Liu, Benlin, Wang, Jiahui, Dong, Yuhao, Chen, Guangyi, Rao, Yongming, Krishna, Ranjay, Lu, Jiwen

In the field of instruction-following large vision-language models (LVLMs), the efficient deployment of these models faces challenges, notably due to the high memory demands of their key-value (KV) caches. Conventional cache management strategies for

Externí odkaz: http://arxiv.org/abs/2407.18121

Zobrazit plný text záznamu

Report

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Autor: Hsieh, Yu-Guan, Hsieh, Cheng-Yu, Yeh, Shih-Ying, Béthune, Louis, Ansari, Hadi Pour, Vasu, Pavan Kumar Anasosalu, Li, Chun-Liang, Krishna, Ranjay, Tuzel, Oncel, Cuturi, Marco

Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflecte

Externí odkaz: http://arxiv.org/abs/2407.06723

Zobrazit plný text záznamu

Report

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Autor: Chuang, Yung-Sung, Qiu, Linlu, Hsieh, Cheng-Yu, Krishna, Ranjay, Kim, Yoon, Glass, James

When asked to summarize articles or answer questions given a passage, large language models (LLMs) can hallucinate details and respond with unsubstantiated answers that are inaccurate with respect to the input context. This paper describes a simple a

Externí odkaz: http://arxiv.org/abs/2407.07071

Zobrazit plný text záznamu

Report

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Autor: Duan, Jiafei, Yuan, Wentao, Pumacay, Wilbert, Wang, Yi Ru, Ehsani, Kiana, Fox, Dieter, Krishna, Ranjay

Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot d

Externí odkaz: http://arxiv.org/abs/2406.18915

Zobrazit plný text záznamu

Report

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Autor: Hsieh, Cheng-Yu, Chuang, Yung-Sung, Li, Chun-Liang, Wang, Zifeng, Le, Long T., Kumar, Abhishek, Glass, James, Ratner, Alexander, Lee, Chen-Yu, Krishna, Ranjay, Pfister, Tomas

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work

Externí odkaz: http://arxiv.org/abs/2406.16008

Zobrazit plný text záznamu

Report

Task Me Anything

Autor: Zhang, Jieyu, Huang, Weikai, Ma, Zixian, Michel, Oscar, He, Dong, Gupta, Tanmay, Ma, Wei-Chiu, Farhadi, Ali, Kembhavi, Aniruddha, Krishna, Ranjay

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for thei

Externí odkaz: http://arxiv.org/abs/2406.11775

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání