Zobrazeno 1 - 10
of 14 842
pro vyhledávání: '"Jia-Jia An"'
Autor:
Dai, Dongyang, Wu, Zhiyong, Kang, Shiyin, Wu, Xixin, Jia, Jia, Su, Dan, Yu, Dong, Meng, Helen
Publikováno v:
Proc. Interspeech 2019, pp. 2090-2094
Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of
Externí odkaz:
http://arxiv.org/abs/2501.01102
Publikováno v:
Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) 2019, pp. 7405-7409
Identifying the emotional state from speech is essential for the natural interaction of the machine with the speaker. However, extracting effective features for emotion recognition is difficult, as emotions are ambiguous. We propose a novel approach
Externí odkaz:
http://arxiv.org/abs/2501.01103
Quantum physics can be extended into the complex domain by considering non-Hermitian Hamiltonians that are $\mathcal{PT}$-symmetric. These exhibit exceptional points (EPs) where the eigenspectrum changes from purely real to purely imaginary values an
Externí odkaz:
http://arxiv.org/abs/2412.08123
Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motio
Externí odkaz:
http://arxiv.org/abs/2410.20986
Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. De
Externí odkaz:
http://arxiv.org/abs/2410.12700
Autor:
Chen, Houlun, Wang, Xin, Chen, Hong, Zhang, Zeyang, Feng, Wei, Huang, Bin, Jia, Jia, Zhu, Wenwu
Existing Video Corpus Moment Retrieval (VCMR) is limited to coarse-grained understanding, which hinders precise video moment localization when given fine-grained queries. In this paper, we propose a more challenging fine-grained VCMR benchmark requir
Externí odkaz:
http://arxiv.org/abs/2410.08593
Synthesizing camera movements from music and dance is highly challenging due to the contradicting requirements and complexities of dance cinematography. Unlike human movements, which are always continuous, dance camera movements involve both continuo
Externí odkaz:
http://arxiv.org/abs/2409.14925
Autor:
Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia
Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio
Externí odkaz:
http://arxiv.org/abs/2408.15676
Autor:
Jin, Zeyu, Jia, Jia, Wang, Qixin, Li, Kehan, Zhou, Shuoyi, Zhou, Songtao, Qin, Xiaoyu, Wu, Zhiyong
Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate i
Externí odkaz:
http://arxiv.org/abs/2408.13608
Autor:
Huang, Shuo, Sun, Shikun, Wang, Zixuan, Qin, Xiaoyu, Xiong, Yanmin, Zhang, Yuan, Wan, Pengfei, Zhang, Di, Jia, Jia
Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view
Externí odkaz:
http://arxiv.org/abs/2407.13976