Zobrazeno 1 - 10
of 64
pro vyhledávání: '"Hwang, Inchul"'
Autor:
Karapiperis, Sotirios, Ellinas, Nikolaos, Vioni, Alexandra, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Raptis, Spyros
Most of the prevalent approaches in speech prosody modeling rely on learning global style representations in a continuous latent space which encode and transfer the attributes of reference speech. However, recent work on neural codecs which are based
Externí odkaz:
http://arxiv.org/abs/2409.08664
Autor:
Mitsios, Michael, Vamvoukakis, Georgios, Maniati, Georgia, Ellinas, Nikolaos, Dimitriou, Georgios, Markopoulos, Konstantinos, Kakoulidis, Panos, Vioni, Alexandra, Christidou, Myrsini, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Vardaxoglou, Georgios, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros
Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and
Externí odkaz:
http://arxiv.org/abs/2404.01805
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
Autor:
Kakoulidis, Panos, Ellinas, Nikolaos, Vamvoukakis, Georgios, Christidou, Myrsini, Vioni, Alexandra, Maniati, Georgia, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Tsiakoulis, Pirros, Chalamandaris, Aimilios
In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource pipeline that does not utilize any singing data end-to-end, since it
Externí odkaz:
http://arxiv.org/abs/2402.01520
Autor:
Klapsas, Konstantinos, Nikitaras, Karolos, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros
A large part of the expressive speech synthesis literature focuses on learning prosodic representations of the speech signal which are then modeled by a prior distribution during inference. In this paper, we compare different prior architectures at t
Externí odkaz:
http://arxiv.org/abs/2211.01327
Autor:
Nikitaras, Karolos, Klapsas, Konstantinos, Ellinas, Nikolaos, Maniati, Georgia, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros
This paper proposes an Expressive Speech Synthesis model that utilizes token-level latent prosodic variables in order to capture and control utterance-level attributes, such as character acting voice and speaking style. Current works aim to explicitl
Externí odkaz:
http://arxiv.org/abs/2211.00523
Autor:
Markopoulos, Konstantinos, Maniati, Georgia, Vamvoukakis, Georgios, Ellinas, Nikolaos, Vardaxoglou, Georgios, Kakoulidis, Panos, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros
The gender of any voice user interface is a key element of its perceived identity. Recently, there has been increasing interest in interfaces where the gender is ambiguous rather than clearly identifying as female or male. This work addresses the tas
Externí odkaz:
http://arxiv.org/abs/2211.00375
Autor:
Vioni, Alexandra, Maniati, Georgia, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros
Current state-of-the-art methods for automatic synthetic speech evaluation are based on MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies on a pretrained sel
Externí odkaz:
http://arxiv.org/abs/2211.00342
Autor:
Ellinas, Nikolaos, Vamvoukakis, Georgios, Markopoulos, Konstantinos, Maniati, Georgia, Kakoulidis, Panos, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros
This paper presents a method for end-to-end cross-lingual text-to-speech (TTS) which aims to preserve the target language's pronunciation regardless of the original speaker's language. The model used is based on a non-attentive Tacotron architecture,
Externí odkaz:
http://arxiv.org/abs/2210.17264
Autor:
Han, Hyojung, Indurthi, Sathish, Zaidi, Mohd Abbas, Lakumarapu, Nikhil Kumar, Lee, Beomseok, Kim, Sangha, Kim, Chanwoo, Hwang, Inchul
Recently, simultaneous translation has gathered a lot of attention since it enables compelling applications such as subtitle translation for a live event or real-time video-call translation. Some of these translation applications allow editing of par
Externí odkaz:
http://arxiv.org/abs/2012.14681
Autor:
Cuayáhuitl, Heriberto, Lee, Donghyeon, Ryu, Seonghan, Cho, Yongjin, Choi, Sungja, Indurthi, Satish, Yu, Seunghak, Choi, Hyungtak, Hwang, Inchul, Kim, Jihie
Trainable chatbots that exhibit fluent and human-like conversations remain a big challenge in artificial intelligence. Deep Reinforcement Learning (DRL) is promising for addressing this challenge, but its successful application remains an open questi
Externí odkaz:
http://arxiv.org/abs/1908.10422