Výsledky vyhledávání

Report

Discriminative Anchor Learning for Efficient Multi-view Clustering

Autor: Qin, Yalan, Pu, Nan, Wu, Hanzhou, Sebe, Nicu

Multi-view clustering aims to study the complementary information across views and discover the underlying structure. For solving the relatively high computational cost for the existing approaches, works based on anchor have been presented recently.

Externí odkaz: http://arxiv.org/abs/2409.16904

Zobrazit plný text záznamu

Report

Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection

Autor: Betti, Federico, Baraldi, Lorenzo, Cucchiara, Rita, Sebe, Nicu

Diffusion models have significantly advanced generative AI, but they encounter difficulties when generating complex combinations of multiple objects. As the final result heavily depends on the initial seed, accurately ensuring the desired output can

Externí odkaz: http://arxiv.org/abs/2409.10597

Zobrazit plný text záznamu

Report

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

Autor: D'Incà, Moreno, Peruzzo, Elia, Mancini, Massimiliano, Xu, Xingqian, Shi, Humphrey, Sebe, Nicu

Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a pri

Externí odkaz: http://arxiv.org/abs/2408.16700

Zobrazit plný text záznamu

Report

PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

Autor: Li, Yidi, Wen, Jiahao, Ren, Bin, Li, Wenhao, Xu, Zhenhuan, Guo, Hao, Liu, Hong, Sebe, Nicu

The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features withi

Externí odkaz: http://arxiv.org/abs/2408.14600

Zobrazit plný text záznamu

Report

Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

Autor: Li, Yidi, Li, Yihan, Guo, Yixin, Ren, Bin, Xu, Zhenhuan, Guo, Hao, Liu, Hong, Sebe, Nicu

In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy ob

Externí odkaz: http://arxiv.org/abs/2408.14585

Zobrazit plný text záznamu

Report

ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining

Autor: Ma, Qi, Li, Yue, Ren, Bin, Sebe, Nicu, Konukoglu, Ender, Gevers, Theo, Van Gool, Luc, Paudel, Danda Pani

3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-sca

Externí odkaz: http://arxiv.org/abs/2408.10906

Zobrazit plný text záznamu

Report

Large Language Models for Multimodal Deformable Image Registration

Autor: Ma, Mingrui, Wang, Weijie, Ning, Jie, He, Jianfeng, Sebe, Nicu, Lepri, Bruno

The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality

Externí odkaz: http://arxiv.org/abs/2408.10703

Zobrazit plný text záznamu

Report

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

Autor: Zhang, Pingping, Li, Jinlong, Wang, Meng, Sebe, Nicu, Kwong, Sam, Wang, Shiqi

Existing codecs are designed to eliminate intrinsic redundancies to create a compact representation for compression. However, strong external priors from Multimodal Large Language Models (MLLMs) have not been explicitly explored in video compression.

Externí odkaz: http://arxiv.org/abs/2408.08093

Zobrazit plný text záznamu

Report

Masked Image Modeling: A Survey

Autor: Hondru, Vlad, Croitoru, Florinel Alin, Minaee, Shervin, Ionescu, Radu Tudor, Sebe, Nicu

In this work, we survey recent studies on masked image modeling (MIM), an approach that emerged as a powerful self-supervised learning technique in computer vision. The MIM task involves masking some information, e.g. pixels, patches, or even latent

Externí odkaz: http://arxiv.org/abs/2408.06687

Zobrazit plný text záznamu

Report

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

Autor: Ge, Xuri, Fu, Junchen, Chen, Fuhai, An, Shan, Sebe, Nicu, Jose, Joemon M.

Publikováno v: ACM Multimedia 2024

Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models have a notab

Externí odkaz: http://arxiv.org/abs/2408.00644

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání