Zobrazeno 1 - 10
of 2 983
pro vyhledávání: '"CAO, Bing"'
The strength of multimodal learning lies in its ability to integrate information from various sources, providing rich and comprehensive insights. However, in real-world scenarios, multi-modal systems often face the challenge of dynamic modality contr
Externí odkaz:
http://arxiv.org/abs/2501.01240
Recently, many studies have been conducted to enhance the zero-shot generalization ability of vision-language models (e.g., CLIP) by addressing the semantic misalignment between image and text embeddings in downstream tasks. Although many efforts hav
Externí odkaz:
http://arxiv.org/abs/2412.04925
The dynamic imbalance of the fore-background is a major challenge in video object counting, which is usually caused by the sparsity of foreground objects. This often leads to severe under- and over-prediction problems and has been less studied in exi
Externí odkaz:
http://arxiv.org/abs/2411.13056
Publikováno v:
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,Main Track,Pages 1317-1325, 2024
Infrared and visible image fusion aim to integrate modality strengths for visually enhanced, informative images. Visible imaging in real-world scenarios is susceptible to dynamic environmental brightness fluctuations, leading to texture degradation.
Externí odkaz:
http://arxiv.org/abs/2411.04697
The inherent challenge of image fusion lies in capturing the correlation of multi-source images and comprehensively integrating effective information from different sources. Most existing techniques fail to perform dynamic image fusion while notably
Externí odkaz:
http://arxiv.org/abs/2411.02840
Image fusion aims to integrate complementary information from multiple input images acquired through various sources to synthesize a new fused image. Existing methods usually employ distinct constraint designs tailored to specific scenes, forming fix
Externí odkaz:
http://arxiv.org/abs/2411.01573
Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most ex
Externí odkaz:
http://arxiv.org/abs/2406.04802
Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny obj
Externí odkaz:
http://arxiv.org/abs/2405.11276
Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a
Externí odkaz:
http://arxiv.org/abs/2404.15677
General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks
Externí odkaz:
http://arxiv.org/abs/2403.12494