Výsledky vyhledávání - "Morrison, Max"

Report

Fine-Grained and Interpretable Neural Speech Editing

Autor: Morrison, Max, Churchwell, Cameron, Pruyne, Nathan, Pardo, Bryan

Fine-grained editing of speech attributes$\unicode{x2014}$such as prosody (i.e., the pitch, loudness, and phoneme durations), pronunciation, speaker identity, and formants$\unicode{x2014}$is useful for fine-tuning and fixing imperfections in human an

Externí odkaz: http://arxiv.org/abs/2407.05471

Zobrazit plný text záznamu

Report

High-Fidelity Neural Phonetic Posteriorgrams

Autor: Churchwell, Cameron, Morrison, Max, Pardo, Bryan

A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker i

Externí odkaz: http://arxiv.org/abs/2402.17735

Zobrazit plný text záznamu

Report

Crowdsourced and Automatic Speech Prominence Estimation

Autor: Morrison, Max, Pawar, Pranav, Pruyne, Nathan, Cole, Jennifer, Pardo, Bryan

The prominence of a spoken word is the degree to which an average native listener perceives the word as salient or emphasized relative to its context. Speech prominence estimation is the process of assigning a numeric value to the prominence of each

Externí odkaz: http://arxiv.org/abs/2310.08464

Zobrazit plný text záznamu

Report

Cross-domain Neural Pitch and Periodicity Estimation

Autor: Morrison, Max, Hsieh, Caedon, Pruyne, Nathan, Pardo, Bryan

Pitch is a foundational aspect of our perception of audio signals. Pitch contours are commonly used to analyze speech and music signals and as input features for many audio tasks, including music transcription, singing voice synthesis, and prosody ed

Externí odkaz: http://arxiv.org/abs/2301.12258

Zobrazit plný text záznamu

Report

Music Separation Enhancement with Generative Modeling

Autor: Schaffer, Noah, Cogan, Boaz, Manilow, Ethan, Morrison, Max, Seetharaman, Prem, Pardo, Bryan

Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the M

Externí odkaz: http://arxiv.org/abs/2208.12387

Zobrazit plný text záznamu

Report

Reproducible Subjective Evaluation

Autor: Morrison, Max, Tang, Brian, Tan, Gefei, Pardo, Bryan

Human perceptual studies are the gold standard for the evaluation of many research tasks in machine learning, linguistics, and psychology. However, these studies require significant time and cost to perform. As a result, many researchers use objectiv

Externí odkaz: http://arxiv.org/abs/2203.04444

Zobrazit plný text záznamu

Report

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Autor: Morrison, Max, Kumar, Rithesh, Kumar, Kundan, Seetharaman, Prem, Courville, Aaron, Bengio, Yoshua

Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or pa

Externí odkaz: http://arxiv.org/abs/2110.10139

Zobrazit plný text záznamu

Report

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

Autor: Morrison, Max, Jin, Zeyu, Bryan, Nicholas J., Caceres, Juan-Pablo, Pardo, Bryan

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis. Thus far, methods for pitch-shifting and time-

Externí odkaz: http://arxiv.org/abs/2110.02360

Zobrazit plný text záznamu

Report

Context-Aware Prosody Correction for Text-Based Speech Editing

Autor: Morrison, Max, Rencker, Lucas, Jin, Zeyu, Bryan, Nicholas J., Caceres, Juan-Pablo, Pardo, Bryan

Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript. A major drawback of current systems, however, is that edited recordings often soun

Externí odkaz: http://arxiv.org/abs/2102.08328

Zobrazit plný text záznamu

Report

Controllable Neural Prosody Synthesis

Autor: Morrison, Max, Jin, Zeyu, Salamon, Justin, Bryan, Nicholas J., Mysore, Gautham J.

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators. However, these systems lack intuitive user controls over prosody, making them unable to rectify prosody er

Externí odkaz: http://arxiv.org/abs/2008.03388

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání