Výsledky vyhledávání - "Kharitonov, Eugene"

Report

MAD Speech: Measures of Acoustic Diversity of Speech

Autor: Futeral, Matthieu, Agostinelli, Andrea, Tagliasacchi, Marco, Zeghidour, Neil, Kharitonov, Eugene

Generative spoken language models produce speech in a wide range of voices, prosody, and recording conditions, seemingly approaching the diversity of natural speech. However, the extent to which generated speech is acoustically diverse remains unclea

Externí odkaz: http://arxiv.org/abs/2404.10419

Zobrazit plný text záznamu

Report

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture

Externí odkaz: http://arxiv.org/abs/2306.12925

Zobrazit plný text záznamu

Report

Long-term Effects of Temperature Variations on Economic Growth: A Machine Learning Approach

Autor: Kharitonov, Eugene, Zakharchuk, Oksana, Mei, Lin

This study investigates the long-term effects of temperature variations on economic growth using a data-driven approach. Leveraging machine learning techniques, we analyze global land surface temperature data from Berkeley Earth and economic indicato

Externí odkaz: http://arxiv.org/abs/2308.06265

Zobrazit plný text záznamu

Report

SoundStorm: Efficient Parallel Audio Generation

Autor: Borsos, Zalán, Sharifi, Matt, Vincent, Damien, Kharitonov, Eugene, Zeghidour, Neil, Tagliasacchi, Marco

We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a n

Externí odkaz: http://arxiv.org/abs/2305.09636

Zobrazit plný text záznamu

Report

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

Autor: Kharitonov, Eugene, Vincent, Damien, Borsos, Zalán, Marinier, Raphaël, Girgin, Sertan, Pietquin, Olivier, Sharifi, Matt, Tagliasacchi, Marco, Zeghidour, Neil

We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to

Externí odkaz: http://arxiv.org/abs/2302.03540

Zobrazit plný text záznamu

Report

AudioLM: a Language Modeling Approach to Audio Generation

Autor: Borsos, Zalán, Marinier, Raphaël, Vincent, Damien, Kharitonov, Eugene, Pietquin, Olivier, Sharifi, Matt, Roblek, Dominik, Teboul, Olivier, Grangier, David, Tagliasacchi, Marco, Zeghidour, Neil

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show

Externí odkaz: http://arxiv.org/abs/2209.03143

Zobrazit plný text záznamu

Report

Generative Spoken Dialogue Language Modeling

Autor: Nguyen, Tu Anh, Kharitonov, Eugene, Copet, Jade, Adi, Yossi, Hsu, Wei-Ning, Elkahky, Ali, Tomasello, Paden, Algayres, Robin, Sagot, Benoit, Mohamed, Abdelrahman, Dupoux, Emmanuel

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained

Externí odkaz: http://arxiv.org/abs/2203.16502

Zobrazit plný text záznamu

Report

textless-lib: a Library for Textless Spoken Language Processing

Autor: Kharitonov, Eugene, Copet, Jade, Lakhotia, Kushal, Nguyen, Tu Anh, Tomasello, Paden, Lee, Ann, Elkahky, Ali, Hsu, Wei-Ning, Mohamed, Abdelrahman, Dupoux, Emmanuel, Adi, Yossi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to faci

Externí odkaz: http://arxiv.org/abs/2202.07359

Zobrazit plný text záznamu

Report

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Autor: Kreuk, Felix, Polyak, Adam, Copet, Jade, Kharitonov, Eugene, Nguyen, Tu-Anh, Rivière, Morgane, Hsu, Wei-Ning, Mohamed, Abdelrahman, Dupoux, Emmanuel, Adi, Yossi

Speech emotion conversion is the task of modifying the perceived emotion of a speech utterance while preserving the lexical content and speaker identity. In this study, we cast the problem of emotion conversion as a spoken language translation task.

Externí odkaz: http://arxiv.org/abs/2111.07402

Zobrazit plný text záznamu

Report

How BPE Affects Memorization in Transformers

Autor: Kharitonov, Eugene, Baroni, Marco, Hupkes, Dieuwke

Training data memorization in NLP can both be beneficial (e.g., closed-book QA) and undesirable (personal data extraction). In any case, successful model training requires a non-trivial amount of memorization to store word spellings, various linguist

Externí odkaz: http://arxiv.org/abs/2110.02782

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání