Zobrazeno 1 - 10
of 2 177
pro vyhledávání: '"Tits, A."'
In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme reco
Externí odkaz:
http://arxiv.org/abs/2405.02124
Autor:
Tits, Noé
In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA)
Externí odkaz:
http://arxiv.org/abs/2310.11541
Autor:
Tits, Noé, Broisson, Zoé
In this paper, we present a solution for providing personalized and instant feedback to English learners through a mobile application, called Flowchase, that is connected to a speech technology able to segment and analyze speech segmental and supra-s
Externí odkaz:
http://arxiv.org/abs/2307.02051
Publikováno v:
Acoustics, Vol 6, Iss 3, Pp 772-781 (2024)
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme reco
Externí odkaz:
https://doaj.org/article/94093317087d40afaf46ad5e9219c579
Publikováno v:
In Applied Geochemistry November 2024 175
Autor:
Delvigne, Victor, Tits, Noé, La Fisca, Luca, Hubens, Nathan, Maiorca, Antoine, Wannous, Hazem, Dutoit, Thierry, Vandeborre, Jean-Philippe
Visual attention estimation is an active field of research at the crossroads of different disciplines: computer vision, artificial intelligence and medicine. One of the most common approaches to estimate a saliency map representing attention is based
Externí odkaz:
http://arxiv.org/abs/2201.03902
Publikováno v:
In Applied Geochemistry October 2024 173
In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and
Externí odkaz:
http://arxiv.org/abs/2103.04097
This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic
Externí odkaz:
http://arxiv.org/abs/2010.02057
ICE-Talk is an open source web-based GUI that allows the use of a TTS system with controllable parameters via a text field and a clickable 2D plot. It enables the study of latent spaces for controllable TTS. Moreover it is implemented as a module tha
Externí odkaz:
http://arxiv.org/abs/2008.11045