Výsledky vyhledávání

Report

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Autor: Awadalla, Anas, Xue, Le, Shu, Manli, Yan, An, Wang, Jun, Purushwalkam, Senthil, Shen, Sheng, Lee, Hannah, Lo, Oscar, Park, Jae Sung, Guha, Etash, Savarese, Silvio, Schmidt, Ludwig, Choi, Yejin, Xiong, Caiming, Xu, Ran

We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text. KALE augments synthetic dense image captions with web-scale alt-text to generate factually

Externí odkaz: http://arxiv.org/abs/2411.07461

Zobrazit plný text záznamu

Report

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

Autor: Li, Jierui, Le, Hung, Zhou, Yingbo, Xiong, Caiming, Savarese, Silvio, Sahoo, Doyen

Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with capabiliti

Externí odkaz: http://arxiv.org/abs/2411.04329

Zobrazit plný text záznamu

Report

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Autor: Chen, Haolin, Feng, Yihao, Liu, Zuxin, Yao, Weiran, Prabhakar, Akshara, Heinecke, Shelby, Ho, Ricky, Mui, Phil, Savarese, Silvio, Xiong, Caiming, Wang, Huan

Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. While prompt-based methods like Chain-of-Thought (CoT) can improve LLM reasoning at inference time, optimizing

Externí odkaz: http://arxiv.org/abs/2411.04282

Zobrazit plný text záznamu

Report

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Autor: Huang, Kung-Hsiang, Prabhakar, Akshara, Dhawan, Sidharth, Mao, Yixin, Wang, Huan, Savarese, Silvio, Xiong, Caiming, Laban, Philippe, Wu, Chien-Sheng

Customer Relationship Management (CRM) systems are vital for modern enterprises, providing a foundation for managing customer interactions and data. Integrating AI agents into CRM systems can automate routine processes and enhance personalized servic

Externí odkaz: http://arxiv.org/abs/2411.02305

Zobrazit plný text záznamu

Report

Impact of extreme ultraviolet radiation on the scintillation of pure and xenon-doped liquid argon

The Xenon-Argon Technology (X-ArT) collaboration presents a study on the dynamics of pure and xenon-doped liquid argon (LAr) scintillation. Using two types of silicon photomultipliers sensitive to different wavelength ranges, we identify a long-lived

Externí odkaz: http://arxiv.org/abs/2410.22863

Zobrazit plný text záznamu

Report

Asynchronous Tool Usage for Real-Time Agents

Autor: Ginart, Antonio A., Kodali, Naveen, Lee, Jason, Xiong, Caiming, Savarese, Silvio, Emmons, John

While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially,

Externí odkaz: http://arxiv.org/abs/2410.21620

Zobrazit plný text záznamu

Report

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Autor: Liu, Zhiwei, Yao, Weiran, Zhang, Jianguo, Murthy, Rithesh, Yang, Liangwei, Liu, Zuxin, Lan, Tian, Zhu, Ming, Tan, Juntao, Kokane, Shirley, Hoang, Thai, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Savarese, Silvio, Xiong, Caiming

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to de

Externí odkaz: http://arxiv.org/abs/2410.18528

Zobrazit plný text záznamu

Report

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Autor: Ryoo, Michael S., Zhou, Honglu, Kendre, Shrikant, Qin, Can, Xue, Le, Shu, Manli, Savarese, Silvio, Xu, Ran, Xiong, Caiming, Niebles, Juan Carlos

We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames. BLIP-3-Video takes advantage of the 'temporal encoder' in addition to the conventio

Externí odkaz: http://arxiv.org/abs/2410.16267

Zobrazit plný text záznamu

Report

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts

Autor: Liu, Xu, Liu, Juncheng, Woo, Gerald, Aksu, Taha, Liang, Yuxuan, Zimmermann, Roger, Liu, Chenghao, Savarese, Silvio, Xiong, Caiming, Sahoo, Doyen

Time series foundation models have demonstrated impressive performance as zero-shot forecasters. However, achieving effectively unified training on time series remains an open challenge. Existing approaches introduce some level of model specializatio

Externí odkaz: http://arxiv.org/abs/2410.10469

Zobrazit plný text záznamu

Report

GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation

Autor: Aksu, Taha, Woo, Gerald, Liu, Juncheng, Liu, Xu, Liu, Chenghao, Savarese, Silvio, Xiong, Caiming, Sahoo, Doyen

Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. However, the advancement of these models has been hindered by the lack of comprehensive benchmarks. To address this gap, we introduce the

Externí odkaz: http://arxiv.org/abs/2410.10393

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání