Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Kharitonov, Eugene"'
Autor:
Futeral, Matthieu, Agostinelli, Andrea, Tagliasacchi, Marco, Zeghidour, Neil, Kharitonov, Eugene
Generative spoken language models produce speech in a wide range of voices, prosody, and recording conditions, seemingly approaching the diversity of natural speech. However, the extent to which generated speech is acoustically diverse remains unclea
Externí odkaz:
http://arxiv.org/abs/2404.10419
Autor:
Rubenstein, Paul K., Asawaroengchai, Chulayuth, Nguyen, Duc Dung, Bapna, Ankur, Borsos, Zalán, Quitry, Félix de Chaumont, Chen, Peter, Badawy, Dalia El, Han, Wei, Kharitonov, Eugene, Muckenhirn, Hannah, Padfield, Dirk, Qin, James, Rozenberg, Danny, Sainath, Tara, Schalkwyk, Johan, Sharifi, Matt, Ramanovich, Michelle Tadmor, Tagliasacchi, Marco, Tudor, Alexandru, Velimirović, Mihajlo, Vincent, Damien, Yu, Jiahui, Wang, Yongqiang, Zayats, Vicky, Zeghidour, Neil, Zhang, Yu, Zhang, Zhishuai, Zilka, Lukas, Frank, Christian
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture
Externí odkaz:
http://arxiv.org/abs/2306.12925
This study investigates the long-term effects of temperature variations on economic growth using a data-driven approach. Leveraging machine learning techniques, we analyze global land surface temperature data from Berkeley Earth and economic indicato
Externí odkaz:
http://arxiv.org/abs/2308.06265
Autor:
Borsos, Zalán, Sharifi, Matt, Vincent, Damien, Kharitonov, Eugene, Zeghidour, Neil, Tagliasacchi, Marco
We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a n
Externí odkaz:
http://arxiv.org/abs/2305.09636
Autor:
Kharitonov, Eugene, Vincent, Damien, Borsos, Zalán, Marinier, Raphaël, Girgin, Sertan, Pietquin, Olivier, Sharifi, Matt, Tagliasacchi, Marco, Zeghidour, Neil
We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to
Externí odkaz:
http://arxiv.org/abs/2302.03540
Autor:
Borsos, Zalán, Marinier, Raphaël, Vincent, Damien, Kharitonov, Eugene, Pietquin, Olivier, Sharifi, Matt, Roblek, Dominik, Teboul, Olivier, Grangier, David, Tagliasacchi, Marco, Zeghidour, Neil
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show
Externí odkaz:
http://arxiv.org/abs/2209.03143
Autor:
Nguyen, Tu Anh, Kharitonov, Eugene, Copet, Jade, Adi, Yossi, Hsu, Wei-Ning, Elkahky, Ali, Tomasello, Paden, Algayres, Robin, Sagot, Benoit, Mohamed, Abdelrahman, Dupoux, Emmanuel
We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained
Externí odkaz:
http://arxiv.org/abs/2203.16502
Autor:
Kharitonov, Eugene, Copet, Jade, Lakhotia, Kushal, Nguyen, Tu Anh, Tomasello, Paden, Lee, Ann, Elkahky, Ali, Hsu, Wei-Ning, Mohamed, Abdelrahman, Dupoux, Emmanuel, Adi, Yossi
Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to faci
Externí odkaz:
http://arxiv.org/abs/2202.07359
Autor:
Kreuk, Felix, Polyak, Adam, Copet, Jade, Kharitonov, Eugene, Nguyen, Tu-Anh, Rivière, Morgane, Hsu, Wei-Ning, Mohamed, Abdelrahman, Dupoux, Emmanuel, Adi, Yossi
Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language translation task.
Externí odkaz:
http://arxiv.org/abs/2111.07402
Training data memorization in NLP can both be beneficial (e.g., closed-book QA) and undesirable (personal data extraction). In any case, successful model training requires a non-trivial amount of memorization to store word spellings, various linguist
Externí odkaz:
http://arxiv.org/abs/2110.02782