Zobrazeno 1 - 10
of 144
pro vyhledávání: '"Nguyen, Tu Anh"'
Autor:
Nguyen, Tu Anh, Muller, Benjamin, Yu, Bokai, Costa-jussa, Marta R., Elbayad, Maha, Popuri, Sravya, Ropers, Christophe, Duquenne, Paul-Ambroise, Algayres, Robin, Mavlyutov, Ruslan, Gat, Itai, Williamson, Mary, Synnaeve, Gabriel, Pino, Juan, Sagot, Benoit, Dupoux, Emmanuel
We introduce Spirit LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a 7B pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Sp
Externí odkaz:
http://arxiv.org/abs/2402.05755
Autor:
Algayres, Robin, Adi, Yossi, Nguyen, Tu Anh, Copet, Jade, Synnaeve, Gabriel, Sagot, Benoit, Dupoux, Emmanuel
In NLP, text language models based on words or subwords are known to outperform their character-based counterparts. Yet, in the speech community, the standard input of spoken LMs are 20ms or 40ms-long discrete units (shorter than a phoneme). Taking i
Externí odkaz:
http://arxiv.org/abs/2310.05224
Autor:
Hsu, Po-chun, Elkahky, Ali, Hsu, Wei-Ning, Adi, Yossi, Nguyen, Tu Anh, Copet, Jade, Dupoux, Emmanuel, Lee, Hung-yi, Mohamed, Abdelrahman
Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes
Externí odkaz:
http://arxiv.org/abs/2309.17020
Autor:
Nguyen, Tu Anh Thi
Monarch butterflies are famous for their annual long-distance migration. Decreasing temperatures and reduced daylight induce the migratory state in the autumn generation of monarch butterflies. Not only are they in a reproductive diapause, they also
Autor:
Nguyen, Tu Anh, Hsu, Wei-Ning, D'Avirro, Antony, Shi, Bowen, Gat, Itai, Fazel-Zarani, Maryam, Remez, Tal, Copet, Jade, Synnaeve, Gabriel, Hassid, Michael, Kreuk, Felix, Adi, Yossi, Dupoux, Emmanuel
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are ha
Externí odkaz:
http://arxiv.org/abs/2308.05725
Autor:
Hassid, Michael, Remez, Tal, Nguyen, Tu Anh, Gat, Itai, Conneau, Alexis, Kreuk, Felix, Copet, Jade, Defossez, Alexandre, Synnaeve, Gabriel, Dupoux, Emmanuel, Schwartz, Roy, Adi, Yossi
Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both
Externí odkaz:
http://arxiv.org/abs/2305.13009
Autor:
Nguyen, Tu Anh, de Seyssel, Maureen, Algayres, Robin, Roze, Patricia, Dunbar, Ewan, Dupoux, Emmanuel
Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least tw
Externí odkaz:
http://arxiv.org/abs/2210.02956
Autor:
Gat, Itai, Kreuk, Felix, Nguyen, Tu Anh, Lee, Ann, Copet, Jade, Synnaeve, Gabriel, Dupoux, Emmanuel, Adi, Yossi
Generative Spoken Language Modeling research focuses on optimizing speech Language Models (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing interna
Externí odkaz:
http://arxiv.org/abs/2209.15483
Autor:
Nguyen, Tu Anh, Kharitonov, Eugene, Copet, Jade, Adi, Yossi, Hsu, Wei-Ning, Elkahky, Ali, Tomasello, Paden, Algayres, Robin, Sagot, Benoit, Mohamed, Abdelrahman, Dupoux, Emmanuel
We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained
Externí odkaz:
http://arxiv.org/abs/2203.16502
Recent work in spoken language modeling shows the possibility of learning a language unsupervisedly from raw audio without any text labels. The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and the
Externí odkaz:
http://arxiv.org/abs/2203.05936