Zobrazeno 1 - 10
of 4 158
pro vyhledávání: '"An, Xulong"'
This paper proposes a novel 3D speech-to-animation (STA) generation framework designed to address the shortcomings of existing models in producing diverse and emotionally resonant animations. Current STA models often generate animations that lack emo
Externí odkaz:
http://arxiv.org/abs/2411.13089
Music emotion recognition (MER) aims to identify the emotions conveyed in a given musical piece. But currently in the field of MER, the available public datasets have limited sample sizes. Recently, segment-based methods for emotion-related tasks hav
Externí odkaz:
http://arxiv.org/abs/2410.21897
Autor:
Gong, Yifan, Wu, Yushu, Zhan, Zheng, Zhao, Pu, Liu, Liangkai, Wu, Chao, Tang, Xulong, Wang, Yanzhi
Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods caus
Externí odkaz:
http://arxiv.org/abs/2410.10847
The audio watermarking technique embeds messages into audio and accurately extracts messages from the watermarked audio. Traditional methods develop algorithms based on expert experience to embed watermarks into the time-domain or transform-domain of
Externí odkaz:
http://arxiv.org/abs/2409.19627
Autor:
Wang, Tianyu, Li, Sheng, Li, Bingyao, Dai, Yue, Li, Ao, Yuan, Geng, Ding, Yufei, Zhang, Youtao, Tang, Xulong
Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (through model retraining) and use the updated mode
Externí odkaz:
http://arxiv.org/abs/2407.13126
3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods
Externí odkaz:
http://arxiv.org/abs/2407.12667
3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neur
Externí odkaz:
http://arxiv.org/abs/2407.12661
Autor:
Peng, Yuanyuan, Lin, Aidi, Wang, Meng, Lin, Tian, Zou, Ke, Cheng, Yinglin, Shi, Tingkun, Liao, Xulong, Feng, Lixia, Liang, Zhen, Chen, Xinjian, Fu, Huazhu, Chen, Haoyu
Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditio
Externí odkaz:
http://arxiv.org/abs/2406.16942
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differe
Externí odkaz:
http://arxiv.org/abs/2405.17900
Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting inten
Externí odkaz:
http://arxiv.org/abs/2405.17028