Výsledky vyhledávání

Report

Autor: Wang, Ziteng, Bongiovanni, Domenico, Wang, Xiangdong, Hu, Zhichan, Jukić, Dario, Song, Daohong, Xu, Jingjun, Morandotti, Roberto, Chen, Zhigang, Buljan, Hrvoje

The discovery of topological phases of matter and topological boundary states had tremendous impact on condensed matter physics and photonics, where topological phases are defined via energy bands, giving rise to topological band theory. However, top

Externí odkaz: http://arxiv.org/abs/2411.11121

Zobrazit plný text záznamu

Report

Disentangling Latent Shifts of In-Context Learning Through Self-Training

Autor: Jukić, Josip, Šnajder, Jan

In-context learning (ICL) has become essential in natural language processing, particularly with autoregressive large language models capable of learning from demonstrations provided within the prompt. However, ICL faces challenges with stability and

Externí odkaz: http://arxiv.org/abs/2410.01508

Zobrazit plný text záznamu

Report

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Autor: Ku, Pin-Jui, Liu, Alexander H., Korostik, Roman, Huang, Sung-Feng, Fu, Szu-Wei, Jukić, Ante

This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal

Externí odkaz: http://arxiv.org/abs/2409.16117

Zobrazit plný text záznamu

Report

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference

Autor: Casanova, Edresson, Langman, Ryan, Neekhara, Paarth, Hussain, Shehzeen, Li, Jason, Ghosh, Subhankar, Jukić, Ante, Lee, Sang-gil

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data. However, audio codecs often operate at hig

Externí odkaz: http://arxiv.org/abs/2409.12117

Zobrazit plný text záznamu

Report

Schr\'odinger Bridge for Generative Speech Enhancement

Autor: Jukić, Ante, Korostik, Roman, Balam, Jagadeesh, Ginsburg, Boris

This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distr

Externí odkaz: http://arxiv.org/abs/2407.16074

Zobrazit plný text záznamu

Report

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Autor: Dhawan, Kunal, Koluguri, Nithin Rao, Jukić, Ante, Langman, Ryan, Balam, Jagadeesh, Ginsburg, Boris

Publikováno v: Proceedings of Interspeech 2024

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-te

Externí odkaz: http://arxiv.org/abs/2407.03495

Zobrazit plný text záznamu

Report

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Autor: Langman, Ryan, Jukić, Ante, Dhawan, Kunal, Koluguri, Nithin Rao, Ginsburg, Boris

Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis

Externí odkaz: http://arxiv.org/abs/2406.05298

Zobrazit plný text záznamu

Report

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend

Autor: Jukić, Ante, Balam, Jagadeesh, Ginsburg, Boris

Publikováno v: WASPAA 2023

This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator applicable to di

Externí odkaz: http://arxiv.org/abs/2406.04552

Zobrazit plný text záznamu

Report

Autonomous Driving with a Deep Dual-Model Solution for Steering and Braking Control

Autor: Jukić, Ana Petra, Šelek, Ana, Seder, Marija, Žarko, Ivana Podnar

The technology of autonomous driving is currently attracting a great deal of interest in both research and industry. In this paper, we present a deep learning dual-model solution that uses two deep neural networks for combined braking and steering in

Externí odkaz: http://arxiv.org/abs/2405.06473

Zobrazit plný text záznamu

Report

From Robustness to Improved Generalization and Calibration in Pre-trained Language Models

Autor: Jukić, Josip, Šnajder, Jan

Enhancing generalization and uncertainty quantification in pre-trained language models (PLMs) is crucial for their effectiveness and reliability. Building on machine learning research that established the importance of robustness for improving genera

Externí odkaz: http://arxiv.org/abs/2404.00758

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání