Zobrazeno 1 - 10
of 220
pro vyhledávání: '"Cao, Jiale"'
Autor:
Munasinghe, Shehan, Gani, Hanan, Zhu, Wenqi, Cao, Jiale, Xing, Eric, Khan, Fahad Shahbaz, Khan, Salman
Fine-grained alignment between videos and text is challenging due to complex spatial and temporal dynamics in videos. Existing video-based Large Multimodal Models (LMMs) handle basic conversations but struggle with precise pixel-level grounding in vi
Externí odkaz:
http://arxiv.org/abs/2411.04923
Recently, the Segment Anything Model (SAM) has demonstrated promising segmentation capabilities in a variety of downstream segmentation tasks. However in the context of universal medical image segmentation there exists a notable performance discrepan
Externí odkaz:
http://arxiv.org/abs/2410.04172
Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. The researchers have explored employing stable diffusion for training-free segmentation.
Externí odkaz:
http://arxiv.org/abs/2409.03209
Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual perfo
Externí odkaz:
http://arxiv.org/abs/2407.17120
Autor:
Li, Yuhao, Naseer, Muzammal, Cao, Jiale, Zhu, Yu, Sun, Jinqiu, Zhang, Yanning, Khan, Fahad Shahbaz
Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising perf
Externí odkaz:
http://arxiv.org/abs/2406.04844
Due to its cost-effectiveness and widespread availability, monocular 3D object detection, which relies solely on a single camera during inference, holds significant importance across various applications, including autonomous driving and robotics. Ne
Externí odkaz:
http://arxiv.org/abs/2404.09431
Text-to-image diffusion models have shown powerful ability on conditional image synthesis. With large-scale vision-language pre-training, diffusion models are able to generate high-quality images with rich texture and reasonable structure under diffe
Externí odkaz:
http://arxiv.org/abs/2404.07600
Autor:
Yu, Zhongrui, Wang, Haoran, Yang, Jinze, Wang, Hanzhang, Xie, Zeke, Cai, Yunfeng, Cao, Jiale, Ji, Zhong, Sun, Mingming
Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Althou
Externí odkaz:
http://arxiv.org/abs/2403.20079
Open-vocabulary video instance segmentation strives to segment and track instances belonging to an open set of categories in a videos. The vision-language model Contrastive Language-Image Pre-training (CLIP) has shown robust zero-shot classification
Externí odkaz:
http://arxiv.org/abs/2403.12455
Open-vocabulary semantic segmentation strives to distinguish pixels into different semantic groups from an open set of categories. Most existing methods explore utilizing pre-trained vision-language models, in which the key is to adopt the image-leve
Externí odkaz:
http://arxiv.org/abs/2311.15537