Výsledky vyhledávání

Report

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Autor: Ji, Xiaozhong, Lin, Chuming, Ding, Zhonggan, Tai, Ying, Yang, Jian, Zhu, Junwei, Hu, Xiaobin, Zhang, Jiangning, Luo, Donghao, Wang, Chengjie

Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical appli

Externí odkaz: http://arxiv.org/abs/2406.18284

Zobrazit plný text záznamu

Report

AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

Autor: Kong, Lingjie, Wu, Kai, Hu, Xiaobin, Han, Wenhui, Peng, Jinlong, Xu, Chengming, Luo, Donghao, Zhang, Jiangning, Wang, Chengjie, Fu, Yanwei

Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is domina

Externí odkaz: http://arxiv.org/abs/2406.11643

Zobrazit plný text záznamu

Report

Open-Vocabulary SAM3D: Understand Any 3D Scene

Autor: Tai, Hanchen, He, Qingdong, Zhang, Jiangning, Qian, Yijie, Zhang, Zhenyu, Hu, Xiaobin, Wang, Yabiao, Liu, Yong

Open-vocabulary 3D scene understanding presents a significant challenge in the field. Recent advancements have sought to transfer knowledge embedded in vision language models from the 2D domain to 3D domain. However, these approaches often require le

Externí odkaz: http://arxiv.org/abs/2405.15580

Zobrazit plný text záznamu

Report

QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

Autor: Li, Hongwei Bran, Navarro, Fernando, Ezhov, Ivan, Bayat, Amirhossein, Das, Dhritiman, Kofler, Florian, Shit, Suprosanna, Waldmannstetter, Diana, Paetzold, Johannes C., Hu, Xiaobin, Wiestler, Benedikt, Zimmer, Lucas, Amiranashvili, Tamaz, Prabhakar, Chinmay, Berger, Christoph, Weidner, Jonas, Alonso-Basant, Michelle, Rashid, Arif, Baid, Ujjwal, Adel, Wesam, Ali, Deniz, Baheti, Bhakti, Bai, Yingbin, Bhatt, Ishaan, Cetindag, Sabri Can, Chen, Wenting, Cheng, Li, Dutand, Prasad, Dular, Lara, Elattar, Mustafa A., Feng, Ming, Gao, Shengbo, Huisman, Henkjan, Hu, Weifeng, Innani, Shubham, Jiat, Wei, Karimi, Davood, Kuijf, Hugo J., Kwak, Jin Tae, Le, Hoang Long, Lia, Xiang, Lin, Huiyan, Liu, Tongliang, Ma, Jun, Ma, Kai, Ma, Ting, Oksuz, Ilkay, Holland, Robbie, Oliveira, Arlindo L., Pal, Jimut Bahan, Pei, Xuan, Qiao, Maoying, Saha, Anindo, Selvan, Raghavendra, Shen, Linlin, Silva, Joao Lourenco, Spiclin, Ziga, Talbar, Sanjay, Wang, Dadong, Wang, Wei, Wang, Xiong, Wang, Yin, Xia, Ruiling, Xu, Kele, Yan, Yanwu, Yergin, Mert, Yu, Shuang, Zeng, Lingxi, Zhang, YingLin, Zhao, Jiachen, Zheng, Yefeng, Zukovec, Martin, Do, Richard, Becker, Anton, Simpson, Amber, Konukoglu, Ender, Jakab, Andras, Bakas, Spyridon, Joskowicz, Leo, Menze, Bjoern

Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentat

Externí odkaz: http://arxiv.org/abs/2405.18435

Zobrazit plný text záznamu

Report

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

Autor: He, Liren, Jiang, Zhengkai, Peng, Jinlong, Liu, Liang, Du, Qiangang, Hu, Xiaobin, Zhu, Wenbing, Chi, Mingmin, Wang, Yabiao, Wang, Chengjie

In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of ``learning shortcuts'', wherein the model fails to learn the patterns of normal samples as it sho

Externí odkaz: http://arxiv.org/abs/2403.11561

Zobrazit plný text záznamu

Report

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Autor: Huang, Siyuan, Ponomarenko, Iaroslav, Jiang, Zhengkai, Li, Xiaoqi, Hu, Xiaobin, Gao, Peng, Li, Hongsheng, Dong, Hao

The integration of Multimodal Large Language Models (MLLMs) with robotic systems has significantly enhanced the ability of robots to interpret and act upon natural language instructions. Despite these advancements, conventional MLLMs are typically tr

Externí odkaz: http://arxiv.org/abs/2403.11289

Zobrazit plný text záznamu

Report

PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

Autor: He, Qingdong, Peng, Jinlong, Jiang, Zhengkai, Hu, Xiaobin, Zhang, Jiangning, Nie, Qiang, Wang, Yabiao, Wang, Chengjie

Recent success of vision foundation models have shown promising performance for the 2D perception tasks. However, it is difficult to train a 3D foundation network directly due to the limited dataset and it remains under explored whether existing foun

Externí odkaz: http://arxiv.org/abs/2403.06403

Zobrazit plný text záznamu

Report

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

Autor: Hu, Xiaobin, Peng, Xu, Luo, Donghao, Ji, Xiaozhong, Peng, Jinlong, Jiang, Zhengkai, Zhang, Jiangning, Jin, Taisong, Wang, Chengjie, Ji, Rongrong

Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inheri

Externí odkaz: http://arxiv.org/abs/2403.06168

Zobrazit plný text záznamu

Report

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

Autor: Wang, Sen, Zhang, Jiangning, Cao, Weijian, Hu, Xiaobin, Li, Moran, Ji, Xiaozhong, Tan, Xin, Li, Mengtian, Xie, Zhifeng, Wang, Chengjie, Ma, Lizhuang

The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generatin

Externí odkaz: http://arxiv.org/abs/2403.02905

Zobrazit plný text záznamu

Report

Joint-Individual Fusion Structure with Fusion Attention Module for Multi-Modal Skin Cancer Classification

Autor: Tang, Peng, Yan, Xintong, Nan, Yang, Hu, Xiaobin, Krammer, Bjoern H Menzee. Sebastian, Lasser, Tobias

Most convolutional neural network (CNN) based methods for skin cancer classification obtain their results using only dermatological images. Although good classification results have been shown, more accurate results can be achieved by considering the

Externí odkaz: http://arxiv.org/abs/2312.04189

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání