Výsledky vyhledávání

Report

ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations

Autor: Zhang, Xulong, Qu, Xiaoyang, Shi, Haoxiang, Xiao, Chunguang, Wang, Jianzong

This paper proposes a novel 3D speech-to-animation (STA) generation framework designed to address the shortcomings of existing models in producing diverse and emotionally resonant animations. Current STA models often generate animations that lack emo

Externí odkaz: http://arxiv.org/abs/2411.13089

Zobrazit plný text záznamu

Report

Semi-Supervised Self-Learning Enhanced Music Emotion Recognition

Autor: Sun, Yifu, Zhang, Xulong, Zhou, Monan, Li, Wei

Music emotion recognition (MER) aims to identify the emotions conveyed in a given musical piece. But currently in the field of MER, the available public datasets have limited sample sizes. Recently, segment-based methods for emotion-related tasks hav

Externí odkaz: http://arxiv.org/abs/2410.21897

Zobrazit plný text záznamu

Report

Lotus: learning-based online thermal and latency variation management for two-stage detectors on edge devices

Autor: Gong, Yifan, Wu, Yushu, Zhan, Zheng, Zhao, Pu, Liu, Liangkai, Wu, Chao, Tang, Xulong, Wang, Yanzhi

Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods caus

Externí odkaz: http://arxiv.org/abs/2410.10847

Zobrazit plný text záznamu

Report

IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding

Autor: Li, Pengcheng, Zhang, Xulong, Xiao, Jing, Wang, Jianzong

The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional methods develop algorithms based on expert experience to embed watermarks into the time-domain or transform-domain of

Externí odkaz: http://arxiv.org/abs/2409.19627

Zobrazit plný text záznamu

Report

Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration

Autor: Wang, Tianyu, Li, Sheng, Li, Bingyao, Dai, Yue, Li, Ao, Yuan, Geng, Ding, Yufei, Zhang, Youtao, Tang, Xulong

Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated mode

Externí odkaz: http://arxiv.org/abs/2407.13126

Zobrazit plný text záznamu

Report

SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization

Autor: Chen, Yiyang, Dong, Siyan, Wang, Xulong, Cai, Lulu, Zheng, Youyi, Yang, Yanchao

3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods

Externí odkaz: http://arxiv.org/abs/2407.12667

Zobrazit plný text záznamu

Report

InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction

Autor: Wang, Xulong, Dong, Siyan, Zheng, Youyi, Yang, Yanchao

3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neur

Externí odkaz: http://arxiv.org/abs/2407.12661

Zobrazit plný text záznamu

Report

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Autor: Peng, Yuanyuan, Lin, Aidi, Wang, Meng, Lin, Tian, Zou, Ke, Cheng, Yinglin, Shi, Tingkun, Liao, Xulong, Feng, Lixia, Liang, Zhen, Chen, Xinjian, Fu, Huazhu, Chen, Haoyu

Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditio

Externí odkaz: http://arxiv.org/abs/2406.16942

Zobrazit plný text záznamu

Report

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

Autor: Shi, Haoxiang, Zhang, Xulong, Cheng, Ning, Zhang, Yong, Yu, Jun, Xiao, Jing, Wang, Jianzong

The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differe

Externí odkaz: http://arxiv.org/abs/2405.17900

Zobrazit plný text záznamu

Report

RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Autor: Shi, Haoxiang, Wang, Jianzong, Zhang, Xulong, Cheng, Ning, Yu, Jun, Xiao, Jing

Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting inten

Externí odkaz: http://arxiv.org/abs/2405.17028

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání