Výsledky vyhledávání

Report

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Autor: Dai, Dongyang, Wu, Zhiyong, Kang, Shiyin, Wu, Xixin, Jia, Jia, Su, Dan, Yu, Dong, Meng, Helen

Publikováno v: Proc. Interspeech 2019, pp. 2090-2094

Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of

Externí odkaz: http://arxiv.org/abs/2501.01102

Zobrazit plný text záznamu

Report

learning discriminative features from spectrograms using center loss for speech emotion recognition

Autor: Dai, Dongyang, Wu, Zhiyong, Li, Runnan, Wu, Xixin, Jia, Jia, Meng, Helen

Publikováno v: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) 2019, pp. 7405-7409

Identifying the emotional state from speech is essential for the natural interaction of the machine with the speaker. However, extracting effective features for emotion recognition is difficult, as emotions are ambiguous. We propose a novel approach

Externí odkaz: http://arxiv.org/abs/2501.01103

Zobrazit plný text záznamu

Report

The robustness of entanglement in non-Hermitian cavity optomechanical system even away from exceptional points

Autor: Wang, Jia-Jia, He, Yu-Hong, Liao, Chang-Geng, Chen, Rong-Xin, Dunningham, Jacob A.

Quantum physics can be extended into the complex domain by considering non-Hermitian Hamiltonians that are $\mathcal{PT}$-symmetric. These exhibit exceptional points (EPs) where the eigenspectrum changes from purely real to purely imaginary values an

Externí odkaz: http://arxiv.org/abs/2412.08123

Zobrazit plný text záznamu

Report

Skinned Motion Retargeting with Dense Geometric Interaction Perception

Autor: Ye, Zijie, Liu, Jia-Wei, Jia, Jia, Sun, Shikun, Shou, Mike Zheng

Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motio

Externí odkaz: http://arxiv.org/abs/2410.20986

Zobrazit plný text záznamu

Report

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization

Autor: Wang, Xingqi, Yi, Xiaoyuan, Xie, Xing, Jia, Jia

Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. De

Externí odkaz: http://arxiv.org/abs/2410.12700

Zobrazit plný text záznamu

Report

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding

Autor: Chen, Houlun, Wang, Xin, Chen, Hong, Zhang, Zeyang, Feng, Wei, Huang, Bin, Jia, Jia, Zhu, Wenwu

Existing Video Corpus Moment Retrieval (VCMR) is limited to coarse-grained understanding, which hinders precise video moment localization when given fine-grained queries. In this paper, we propose a more challenging fine-grained VCMR benchmark requir

Externí odkaz: http://arxiv.org/abs/2410.08593

Zobrazit plný text záznamu

Report

DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis

Autor: Wang, Zixuan, Li, Jiayi, Qin, Xiaoyu, Sun, Shikun, Zhou, Songtao, Jia, Jia, Luo, Jiebo

Synthesizing camera movements from music and dance is highly challenging due to the contradicting requirements and complexities of dance cinematography. Unlike human movements, which are always continuous, dance camera movements involve both continuo

Externí odkaz: http://arxiv.org/abs/2409.14925

Zobrazit plný text záznamu

Report

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Autor: Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia

Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio

Externí odkaz: http://arxiv.org/abs/2408.15676

Zobrazit plný text záznamu

Report

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

Autor: Jin, Zeyu, Jia, Jia, Wang, Qixin, Li, Kehan, Zhou, Shuoyi, Zhou, Songtao, Qin, Xiaoyu, Wu, Zhiyong

Speech-language multi-modal learning presents a significant challenge due to the fine nuanced information inherent in speech styles. Therefore, a large-scale dataset providing elaborate comprehension of speech style is urgently needed to facilitate i

Externí odkaz: http://arxiv.org/abs/2408.13608

Zobrazit plný text záznamu

Report

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Autor: Huang, Shuo, Sun, Shikun, Wang, Zixuan, Qin, Xiaoyu, Xiong, Yanmin, Zhang, Yuan, Wan, Pengfei, Zhang, Di, Jia, Jia

Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view

Externí odkaz: http://arxiv.org/abs/2407.13976

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání