Výsledky vyhledávání

Report

Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling

Autor: Karapiperis, Sotirios, Ellinas, Nikolaos, Vioni, Alexandra, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Raptis, Spyros

Most of the prevalent approaches in speech prosody modeling rely on learning global style representations in a continuous latent space which encode and transfer the attributes of reference speech. However, recent work on neural codecs which are based

Externí odkaz: http://arxiv.org/abs/2409.08664

Zobrazit plný text záznamu

Report

Attention-based Iterative Decomposition for Tensor Product Representation

Autor: Park, Taewon, Choi, Inchul, Lee, Minho

In recent research, Tensor Product Representation (TPR) is applied for the systematic generalization task of deep neural networks by learning the compositional structure of data. However, such prior works show limited performance in discovering and r

Externí odkaz: http://arxiv.org/abs/2406.01012

Zobrazit plný text záznamu

Report

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Autor: Mitsios, Michael, Vamvoukakis, Georgios, Maniati, Georgia, Ellinas, Nikolaos, Dimitriou, Georgios, Markopoulos, Konstantinos, Kakoulidis, Panos, Vioni, Alexandra, Christidou, Myrsini, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Vardaxoglou, Georgios, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros

Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and

Externí odkaz: http://arxiv.org/abs/2404.01805

Zobrazit plný text záznamu

Report

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Autor: Kakoulidis, Panos, Ellinas, Nikolaos, Vamvoukakis, Georgios, Christidou, Myrsini, Vioni, Alexandra, Maniati, Georgia, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Tsiakoulis, Pirros, Chalamandaris, Aimilios

In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource pipeline that does not utilize any singing data end-to-end, since it

Externí odkaz: http://arxiv.org/abs/2402.01520

Zobrazit plný text záznamu

Report

Self-Calibrating, Fully Differentiable NLOS Inverse Rendering

Autor: Choi, Kiseok, Kim, Inchul, Choi, Dongyoung, Marco, Julio, Gutierrez, Diego, Kim, Min H.

Publikováno v: Proceedings of ACM SIGGRAPH Asia 2023 (December 2023)

Existing time-resolved non-line-of-sight (NLOS) imaging methods reconstruct hidden scenes by inverting the optical paths of indirect illumination measured at visible relay surfaces. These methods are prone to reconstruction artifacts due to inversion

Externí odkaz: http://arxiv.org/abs/2309.12047

Zobrazit plný text záznamu

Akademický článek

Discovery of Potent Degraders of the Dengue Virus Envelope Protein

Autor: Zhengnian Li, Han‐Yuan Liu, Zhixiang He, Antara Chakravarty, Ryan P. Golden, Zixuan Jiang, Inchul You, Hong Yue, Katherine A. Donovan, Guangyan Du, Jianwei Che, Jason Tse, Isaac Che, Wenchao Lu, Eric S. Fischer, Tinghu Zhang, Nathanael S. Gray, Priscilla L. Yang

Publikováno v: Advanced Science, Vol 11, Iss 40, Pp n/a-n/a (2024)

Abstract Targeted protein degradation has been widely adopted as a new approach to eliminate both established and previously recalcitrant therapeutic targets. Here, it is reported that the development of small molecule degraders of the envelope (E) p

Externí odkaz: https://doaj.org/article/d886eb34b9c74a7eabd320cf5081d214

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Autor: Klapsas, Konstantinos, Nikitaras, Karolos, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

A large part of the expressive speech synthesis literature focuses on learning prosodic representations of the speech signal which are then modeled by a prior distribution during inference. In this paper, we compare different prior architectures at t

Externí odkaz: http://arxiv.org/abs/2211.01327

Zobrazit plný text záznamu

Report

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Autor: Nikitaras, Karolos, Klapsas, Konstantinos, Ellinas, Nikolaos, Maniati, Georgia, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

This paper proposes an Expressive Speech Synthesis model that utilizes token-level latent prosodic variables in order to capture and control utterance-level attributes, such as character acting voice and speaking style. Current works aim to explicitl

Externí odkaz: http://arxiv.org/abs/2211.00523

Zobrazit plný text záznamu

Report

Generating Multilingual Gender-Ambiguous Text-to-Speech Voices

Autor: Markopoulos, Konstantinos, Maniati, Georgia, Vamvoukakis, Georgios, Ellinas, Nikolaos, Vardaxoglou, Georgios, Kakoulidis, Panos, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros

The gender of any voice user interface is a key element of its perceived identity. Recently, there has been increasing interest in interfaces where the gender is ambiguous rather than clearly identifying as female or male. This work addresses the tas

Externí odkaz: http://arxiv.org/abs/2211.00375

Zobrazit plný text záznamu

Report

Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features

Autor: Vioni, Alexandra, Maniati, Georgia, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros

Current state-of-the-art methods for automatic synthetic speech evaluation are based on MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies on a pretrained sel

Externí odkaz: http://arxiv.org/abs/2211.00342

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání