Zobrazeno 1 - 10
of 12 255
pro vyhledávání: '"LI, You"'
The Zero-shot Composed Image Retrieval (ZSCIR) requires retrieving images that match the query image and the relative captions. Current methods focus on projecting the query image into the text feature space, subsequently combining them with features
Externí odkaz:
http://arxiv.org/abs/2411.16752
Diffusion models have recently been employed to generate high-quality images, reducing the need for manual data collection and improving model generalization in tasks such as object detection, instance segmentation, and image perception. However, the
Externí odkaz:
http://arxiv.org/abs/2411.16749
Multimodal Sentiment Analysis (MSA) utilizes multimodal data to infer the users' sentiment. Previous methods focus on equally treating the contribution of each modality or statically using text as the dominant modality to conduct interaction, which n
Externí odkaz:
http://arxiv.org/abs/2410.04491
Fundus image classification is crucial in the computer aided diagnosis tasks, but label noise significantly impairs the performance of deep neural networks. To address this challenge, we propose a robust framework, Self-Supervised Pre-training with R
Externí odkaz:
http://arxiv.org/abs/2409.18147
Autor:
Liu, Wei, Zhu, Jiaqi, Zhuo, Guirong, Fu, Wufei, Meng, Zonglin, Lu, Yishi, Hua, Min, Qiao, Feng, Li, You, He, Yi, Xiong, Lu
Intelligent transportation systems (ITS) localization is of significant importance as it provides fundamental position and orientation for autonomous operations like intelligent vehicles. Integrating diverse and complementary sensors such as global n
Externí odkaz:
http://arxiv.org/abs/2409.12426
Autor:
Ren, Wenze, Wu, Haibin, Lin, Yi-Cheng, Chen, Xuanjun, Chao, Rong, Hung, Kuo-Hsuan, Li, You-Jin, Ting, Wen-Yuan, Wang, Hsin-Min, Tsao, Yu
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and
Externí odkaz:
http://arxiv.org/abs/2409.10376
LiDAR-Inertial Odometry (LIO) demonstrates outstanding accuracy and stability in general low-speed and smooth motion scenarios. However, in high-speed and intense motion scenarios, such as sharp turns, two primary challenges arise: firstly, due to th
Externí odkaz:
http://arxiv.org/abs/2408.11426
Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals,
Externí odkaz:
http://arxiv.org/abs/2407.08457
We introduce the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each accurately placed at predefined positions with attributes such as category, color, and shape, strictly following user sp
Externí odkaz:
http://arxiv.org/abs/2407.02329
Autor:
Wang, Kuan-Chen, Li, You-Jin, Chen, Wei-Lun, Chen, Yu-Wen, Wang, Yi-Ching, Yeh, Ping-Cheng, Zhang, Chao, Tsao, Yu
Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introdu
Externí odkaz:
http://arxiv.org/abs/2406.12699