Zobrazeno 1 - 10
of 439
pro vyhledávání: '"Ye, Qixiang"'
Autor:
Yu, Hongtian, Li, Yangu, Wu, Mingrui, Shen, Letian, Liu, Yue, Song, Yunxuan, Ye, Qixiang, Lyu, Xiaorui, Mao, Yajun, Zheng, Yangheng, Liu, Yunfan
In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles.
Externí odkaz:
http://arxiv.org/abs/2408.10599
Depth information provides valuable insights into the 3D structure especially the outline of objects, which can be utilized to improve the semantic segmentation tasks. However, a naive fusion of depth information can disrupt feature and compromise ac
Externí odkaz:
http://arxiv.org/abs/2408.09097
Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent Sf
Externí odkaz:
http://arxiv.org/abs/2408.08723
Autor:
Liao, Mingxiang, Lu, Hannan, Zhang, Xinyu, Wan, Fang, Wang, Tianyu, Zhao, Yuzhong, Zuo, Wangmeng, Ye, Qixiang, Wang, Jingdong
Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet la
Externí odkaz:
http://arxiv.org/abs/2407.01094
Autor:
Ma, Tianren, Xie, Lingxi, Tian, Yunjie, Yang, Boyu, Zhang, Yuan, Doermann, David, Ye, Qixiang
An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing
Externí odkaz:
http://arxiv.org/abs/2406.11327
Autor:
Qiu, Jihao, Zhang, Yuan, Tang, Xi, Xie, Lingxi, Ma, Tianren, Yan, Pengyu, Doermann, David, Ye, Qixiang, Tian, Yunjie
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2406.00258
3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. However, achieving successful reconstruction from RGB images generally requires multiple input views captured under static conditions. To address the cha
Externí odkaz:
http://arxiv.org/abs/2405.19657
A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image. In this study, we propose vHeat, a novel vision backbone model th
Externí odkaz:
http://arxiv.org/abs/2405.16555
Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise langua
Externí odkaz:
http://arxiv.org/abs/2405.16071
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a transformer decoder to generate semantic masks. Despite its effectiveness, prevailing met
Externí odkaz:
http://arxiv.org/abs/2403.08426