Zobrazeno 1 - 10
of 316
pro vyhledávání: '"Xue Le"'
Autor:
Qin, Can, Xia, Congying, Ramakrishnan, Krithika, Ryoo, Michael, Tu, Lifu, Feng, Yihao, Shu, Manli, Zhou, Honglu, Awadalla, Anas, Wang, Jun, Purushwalkam, Senthil, Xue, Le, Zhou, Yingbo, Wang, Huan, Savarese, Silvio, Niebles, Juan Carlos, Chen, Zeyuan, Xu, Ran, Xiong, Caiming
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and i
Externí odkaz:
http://arxiv.org/abs/2408.12590
Autor:
Xue, Le, Shu, Manli, Awadalla, Anas, Wang, Jun, Yan, An, Purushwalkam, Senthil, Zhou, Honglu, Prabhu, Viraj, Dai, Yutong, Ryoo, Michael S, Kendre, Shrikant, Zhang, Jieyu, Qin, Can, Zhang, Shu, Chen, Chia-Chih, Yu, Ning, Tan, Juntao, Awalgaonkar, Tulika Manoj, Heinecke, Shelby, Wang, Huan, Choi, Yejin, Schmidt, Ludwig, Chen, Zeyuan, Savarese, Silvio, Niebles, Juan Carlos, Xiong, Caiming, Xu, Ran
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, s
Externí odkaz:
http://arxiv.org/abs/2408.08872
Autor:
Awadalla, Anas, Xue, Le, Lo, Oscar, Shu, Manli, Lee, Hannah, Guha, Etash Kumar, Jordan, Matt, Shen, Sheng, Awadalla, Mohamed, Savarese, Silvio, Xiong, Caiming, Xu, Ran, Choi, Yejin, Schmidt, Ludwig
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs). Despite the rapid progression of open-source LMMs, there remains a pronounced scarcity of l
Externí odkaz:
http://arxiv.org/abs/2406.11271
Autor:
Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos
Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and e
Externí odkaz:
http://arxiv.org/abs/2311.18799
Autor:
Liu, Zhiwei, Yao, Weiran, Zhang, Jianguo, Xue, Le, Heinecke, Shelby, Murthy, Rithesh, Feng, Yihao, Chen, Zeyuan, Niebles, Juan Carlos, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to
Externí odkaz:
http://arxiv.org/abs/2308.05960
Autor:
Yao, Weiran, Heinecke, Shelby, Niebles, Juan Carlos, Liu, Zhiwei, Feng, Yihao, Xue, Le, Murthy, Rithesh, Chen, Zeyuan, Zhang, Jianguo, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio
Recent months have seen the emergence of a powerful new trend in which large language models (LLMs) are augmented to become autonomous language agents capable of performing objective oriented multi-step tasks on their own, rather than merely respondi
Externí odkaz:
http://arxiv.org/abs/2308.02151
Autor:
Murthy, Rithesh, Heinecke, Shelby, Niebles, Juan Carlos, Liu, Zhiwei, Xue, Le, Yao, Weiran, Feng, Yihao, Chen, Zeyuan, Gokul, Akash, Arpit, Devansh, Xu, Ran, Mui, Phil, Wang, Huan, Xiong, Caiming, Savarese, Silvio
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the
Externí odkaz:
http://arxiv.org/abs/2307.08962
Autor:
Xue, Le, Yu, Ning, Zhang, Shu, Panagopoulou, Artemis, Li, Junnan, Martín-Martín, Roberto, Wu, Jiajun, Xiong, Caiming, Xu, Ran, Niebles, Juan Carlos, Savarese, Silvio
Publikováno v:
CVPR2024
Recent advancements in multimodal pre-training have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions. However, the methods used by existing frame
Externí odkaz:
http://arxiv.org/abs/2305.08275
Autor:
Shu, Manli, Xue, Le, Yu, Ning, Martín-Martín, Roberto, Xiong, Caiming, Goldstein, Tom, Niebles, Juan Carlos, Xu, Ran
3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. Howe
Externí odkaz:
http://arxiv.org/abs/2301.02650
Autor:
Xue, Le, Gao, Mingfei, Xing, Chen, Martín-Martín, Roberto, Wu, Jiajun, Xiong, Caiming, Xu, Ran, Niebles, Juan Carlos, Savarese, Silvio
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. In its 2D counterpart, recent advances have shown that similar problems can be signi
Externí odkaz:
http://arxiv.org/abs/2212.05171