Zobrazeno 1 - 10
of 34
pro vyhledávání: '"Yang, Jihan"'
Autor:
Tong, Shengbang, Brown, Ellis, Wu, Penghao, Woo, Sanghyun, Middepogu, Manoj, Akula, Sai Charitha, Yang, Jihan, Yang, Shusheng, Iyer, Adithya, Pan, Xichen, Wang, Austin, Fergus, Rob, LeCun, Yann, Xie, Saining
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and d
Externí odkaz:
http://arxiv.org/abs/2406.16860
Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues for human interaction with embodied agents or robots using natural language. Despite this progress, we find a notable limitation: existing 3D-VL models exhibit sensitiv
Externí odkaz:
http://arxiv.org/abs/2403.14760
There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created. To develop AI agents that can sense, think, and act as flexibly as humans in real-world settings, it is imperative to bridge t
Externí odkaz:
http://arxiv.org/abs/2402.03310
Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because the model needs to both localize novel 3D objects and infer their sema
Externí odkaz:
http://arxiv.org/abs/2308.00353
We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely \textbf{RegionPLC}, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories. Specifically, based on o
Externí odkaz:
http://arxiv.org/abs/2304.00962
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocab
Externí odkaz:
http://arxiv.org/abs/2211.16312
Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors, focusing on p
Externí odkaz:
http://arxiv.org/abs/2205.15156
Deep learning approaches achieve prominent success in 3D semantic segmentation. However, collecting densely annotated real-world 3D datasets is extremely time-consuming and expensive. Training models on synthetic data and generalizing on real-world s
Externí odkaz:
http://arxiv.org/abs/2204.01599
Large-scale pre-training has been proven to be crucial for various computer vision tasks. However, with the increase of pre-training data amount, model architecture amount, and the private/inaccessible data, it is not very efficient or possible to pr
Externí odkaz:
http://arxiv.org/abs/2203.05180
In this paper, we present a self-training method, named ST3D++, with a holistic pseudo label denoising pipeline for unsupervised domain adaptation on 3D object detection. ST3D++ aims at reducing noise in pseudo label generation as well as alleviating
Externí odkaz:
http://arxiv.org/abs/2108.06682