Výsledky vyhledávání

Report

What happens to diffusion model likelihood when your model is conditional?

Diffusion Models (DMs) iteratively denoise random samples to produce high-quality data. The iterative sampling process is derived from Stochastic Differential Equations (SDEs), allowing a speed-quality trade-off chosen at inference. Another advantage

Externí odkaz: http://arxiv.org/abs/2409.06364

Zobrazit plný text záznamu

Report

Foundation Models for Music: A Survey

In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models

Externí odkaz: http://arxiv.org/abs/2408.14340

Zobrazit plný text záznamu

Report

Self-Train Before You Transcribe

Autor: Flynn, Robert, Ragni, Anton

When there is a mismatch between the training and test domains, current speech recognition systems show significant performance degradation. Self-training methods, such as noisy student teacher training, can help address this and enable the adaptatio

Externí odkaz: http://arxiv.org/abs/2406.12937

Zobrazit plný text záznamu

Report

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Autor: Leung, Wing-Zin, Cross, Mattias, Ragni, Anton, Goetze, Stefan

Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environmen

Externí odkaz: http://arxiv.org/abs/2406.08568

Zobrazit plný text záznamu

Report

General Effect Modelling (GEM) -- Part 2. Multivariate GEM applied to gene expression data of type 2 diabetes detects information that is lost by univariate validation

Autor: Mosleth, Ellen Færgestad, Dankel, Simon Erling Nitter, Mellgren, Gunnar, Olmos, Francisco Martin Barajas, Orozco, Lorena Sofia, Lysenko, Artem, Ofstad, Ragni, Begum, Most Champa, Martens, Harald, Liland, Kristian Hovde

General Effect Modelling (GEM) is an umbrella over different methods that utilise effects in the analyses of data with multiple design variables and multivariate responses. To demonstrate the methodology, we here use GEM in gene expression data where

Externí odkaz: http://arxiv.org/abs/2404.03029

Zobrazit plný text záznamu

Report

Analytic method for quadratic polarons in nonparabolic bands

Autor: Klimin, Serghei N., Tempere, Jacques, Houtput, Matthew, Ragni, Stefano, Hahn, Thomas, Franchini, Cesare, Mishchenko, Andrey S.

Publikováno v: Phys. Rev. B 110, 075107 (2024)

Including the effect of lattice anharmonicity on electron-phonon interactions has recently garnered attention due to its role as a necessary and significant component in explaining various phenomena, including superconductivity, optical response, and

Externí odkaz: http://arxiv.org/abs/2403.18019

Zobrazit plný text záznamu

Report

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models

Autor: Mogridge, Rhiannon, Close, George, Sutherland, Robert, Hain, Thomas, Barker, Jon, Goetze, Stefan, Ragni, Anton

Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found

Externí odkaz: http://arxiv.org/abs/2401.13611

Zobrazit plný text záznamu

Report

How Much Context Does My Attention-Based ASR System Need?

Autor: Flynn, Robert, Ragni, Anton

For the task of speech recognition, the use of more than 30 seconds of acoustic context during training is uncommon and under-investigated in literature. In this work, we conduct an empirical study on the effect of scaling the sequence length used to

Externí odkaz: http://arxiv.org/abs/2310.15672

Zobrazit plný text záznamu

Report

Energy-Based Models For Speech Synthesis

Autor: Sun, Wanli, Tu, Zehai, Ragni, Anton

Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inferenc

Externí odkaz: http://arxiv.org/abs/2310.12765

Zobrazit plný text záznamu

Report

Low-Frequency Intensity Modulation of High-Frequency Rotor Noise

Autor: Baars, Woutijn J., Ragni, Daniele

Acoustic spectra of rotor noise yield frequency-distributions of energy within pressure time series. However, they are unable to reveal phase-relations between different frequency components, while the latter play a role in low-frequency intensity mo

Externí odkaz: http://arxiv.org/abs/2310.01056

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání