Zobrazeno 1 - 10
of 1 094
pro vyhledávání: '"Sebe, Nicu"'
Multi-view clustering aims to study the complementary information across views and discover the underlying structure. For solving the relatively high computational cost for the existing approaches, works based on anchor have been presented recently.
Externí odkaz:
http://arxiv.org/abs/2409.16904
Diffusion models have significantly advanced generative AI, but they encounter difficulties when generating complex combinations of multiple objects. As the final result heavily depends on the initial seed, accurately ensuring the desired output can
Externí odkaz:
http://arxiv.org/abs/2409.10597
Autor:
D'Incà, Moreno, Peruzzo, Elia, Mancini, Massimiliano, Xu, Xingqian, Shi, Humphrey, Sebe, Nicu
Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a pri
Externí odkaz:
http://arxiv.org/abs/2408.16700
The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features withi
Externí odkaz:
http://arxiv.org/abs/2408.14600
In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy ob
Externí odkaz:
http://arxiv.org/abs/2408.14585
Autor:
Ma, Qi, Li, Yue, Ren, Bin, Sebe, Nicu, Konukoglu, Ender, Gevers, Theo, Van Gool, Luc, Paudel, Danda Pani
3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-sca
Externí odkaz:
http://arxiv.org/abs/2408.10906
The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality
Externí odkaz:
http://arxiv.org/abs/2408.10703
Existing codecs are designed to eliminate intrinsic redundancies to create a compact representation for compression. However, strong external priors from Multimodal Large Language Models (MLLMs) have not been explicitly explored in video compression.
Externí odkaz:
http://arxiv.org/abs/2408.08093
In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent
Externí odkaz:
http://arxiv.org/abs/2408.06687
Publikováno v:
ACM Multimedia 2024
Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notab
Externí odkaz:
http://arxiv.org/abs/2408.00644