Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Gudmalwar, Ashishkumar P."'
Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in
Externí odkaz:
http://arxiv.org/abs/2406.08802
Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source languag
Externí odkaz:
http://arxiv.org/abs/2406.08076
Autor:
Mhaskar, Shivam Ratnakant, Shah, Nirmesh J., Zaki, Mohammadi, Gudmalwar, Ashishkumar P., Wasnik, Pankaj, Shah, Rajiv Ratn
Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to r
Externí odkaz:
http://arxiv.org/abs/2403.15469
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Publikováno v:
SN Computer Science; January 2023, Vol. 4 Issue: 1