Výsledky vyhledávání

Report

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Autor: Awadalla, Anas, Xue, Le, Lo, Oscar, Shu, Manli, Lee, Hannah, Guha, Etash Kumar, Jordan, Matt, Shen, Sheng, Awadalla, Mohamed, Savarese, Silvio, Xiong, Caiming, Xu, Ran, Choi, Yejin, Schmidt, Ludwig

Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of l

Externí odkaz: http://arxiv.org/abs/2406.11271

Zobrazit plný text záznamu

Report

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Autor: Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos

Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and e

Externí odkaz: http://arxiv.org/abs/2311.18799

Zobrazit plný text záznamu

Report

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Autor: Liu, Zhiwei, Yao, Weiran, Zhang, Jianguo, Xue, Le, Heinecke, Shelby, Murthy, Rithesh, Feng, Yihao, Chen, Zeyuan, Niebles, Juan Carlos, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to

Externí odkaz: http://arxiv.org/abs/2308.05960

Zobrazit plný text záznamu

Report

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Autor: Yao, Weiran, Heinecke, Shelby, Niebles, Juan Carlos, Liu, Zhiwei, Feng, Yihao, Xue, Le, Murthy, Rithesh, Chen, Zeyuan, Zhang, Jianguo, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio

Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely respondi

Externí odkaz: http://arxiv.org/abs/2308.02151

Zobrazit plný text záznamu

Report

REX: Rapid Exploration and eXploitation for AI Agents

Autor: Murthy, Rithesh, Heinecke, Shelby, Niebles, Juan Carlos, Liu, Zhiwei, Xue, Le, Yao, Weiran, Feng, Yihao, Chen, Zeyuan, Gokul, Akash, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the

Externí odkaz: http://arxiv.org/abs/2307.08962

Zobrazit plný text záznamu

Report

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

Autor: Xue, Le, Yu, Ning, Zhang, Shu, Panagopoulou, Artemis, Li, Junnan, Martín-Martín, Roberto, Wu, Jiajun, Xiong, Caiming, Xu, Ran, Niebles, Juan Carlos, Savarese, Silvio

Publikováno v: CVPR2024

Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frame

Externí odkaz: http://arxiv.org/abs/2305.08275

Zobrazit plný text záznamu

Report

Hierarchical Point Attention for Indoor 3D Object Detection

Autor: Shu, Manli, Xue, Le, Yu, Ning, Martín-Martín, Roberto, Xiong, Caiming, Goldstein, Tom, Niebles, Juan Carlos, Xu, Ran

3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. Howe

Externí odkaz: http://arxiv.org/abs/2301.02650

Zobrazit plný text záznamu

Report

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Autor: Xue, Le, Gao, Mingfei, Xing, Chen, Martín-Martín, Roberto, Wu, Jiajun, Xiong, Caiming, Xu, Ran, Niebles, Juan Carlos, Savarese, Silvio

The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. In its 2D counterpart, recent advances have shown that similar problems can be signi

Externí odkaz: http://arxiv.org/abs/2212.05171

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání