Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Tian, Zhuotao"'
Recent advancements in vision-language models have enhanced performance by increasing the length of visual tokens, making them much longer than text tokens and significantly raising computational costs. However, we observe that the visual tokens gene
Externí odkaz:
http://arxiv.org/abs/2412.04467
Autor:
Liu, Yijun, Cui, Jiequan, Tian, Zhuotao, Yang, Senqiao, He, Qingdong, Wang, Xiaoling, Su, Jingyong
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores, hindering the applications in critical systems. In this paper, we propose a novel approach called Typicalness-Aw
Externí odkaz:
http://arxiv.org/abs/2411.01981
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training
Externí odkaz:
http://arxiv.org/abs/2407.08268
Autor:
Tang, Longxiang, Tian, Zhuotao, Li, Kai, He, Chunming, Zhou, Hantao, Zhao, Hengshuang, Li, Xiu, Jia, Jiaya
This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Lan
Externí odkaz:
http://arxiv.org/abs/2407.05342
Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim t
Externí odkaz:
http://arxiv.org/abs/2406.18629
Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world sc
Externí odkaz:
http://arxiv.org/abs/2404.07470
This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing la
Externí odkaz:
http://arxiv.org/abs/2404.07155
Autor:
Peng, Bohao, Wu, Xiaoyang, Jiang, Li, Chen, Yukang, Zhao, Hengshuang, Tian, Zhuotao, Jia, Jiaya
The booming of 3D recognition in the 2020s began with the introduction of point cloud transformers. They quickly overwhelmed sparse CNNs and became state-of-the-art models, especially in 3D semantic segmentation. However, sparse CNNs are still valuab
Externí odkaz:
http://arxiv.org/abs/2403.14418
Autor:
Wang, Chengyao, Jiang, Li, Wu, Xiaoyang, Tian, Zhuotao, Peng, Bohao, Zhao, Hengshuang, Jia, Jiaya
Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as the pretext task, which assigns matched points in two distinct views as
Externí odkaz:
http://arxiv.org/abs/2403.09639
While LISA effectively bridges the gap between segmentation and large language models to enable reasoning segmentation, it poses certain limitations: unable to distinguish different instances of the target region, and constrained by the pre-defined t
Externí odkaz:
http://arxiv.org/abs/2312.17240