Zobrazeno 1 - 10
of 12 867
pro vyhledávání: '"Jukić A"'
Autor:
Wang, Ziteng, Bongiovanni, Domenico, Wang, Xiangdong, Hu, Zhichan, Jukić, Dario, Song, Daohong, Xu, Jingjun, Morandotti, Roberto, Chen, Zhigang, Buljan, Hrvoje
The discovery of topological phases of matter and topological boundary states had tremendous impact on condensed matter physics and photonics, where topological phases are defined via energy bands, giving rise to topological band theory. However, top
Externí odkaz:
http://arxiv.org/abs/2411.11121
Autor:
Jukić, Josip, Šnajder, Jan
In-context learning (ICL) has become essential in natural language processing, particularly with autoregressive large language models capable of learning from demonstrations provided within the prompt. However, ICL faces challenges with stability and
Externí odkaz:
http://arxiv.org/abs/2410.01508
This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal
Externí odkaz:
http://arxiv.org/abs/2409.16117
Autor:
Casanova, Edresson, Langman, Ryan, Neekhara, Paarth, Hussain, Shehzeen, Li, Jason, Ghosh, Subhankar, Jukić, Ante, Lee, Sang-gil
Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modeling techniques to audio data. However, audio codecs often operate at hig
Externí odkaz:
http://arxiv.org/abs/2409.12117
This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distr
Externí odkaz:
http://arxiv.org/abs/2407.16074
Autor:
Dhawan, Kunal, Koluguri, Nithin Rao, Jukić, Ante, Langman, Ryan, Balam, Jagadeesh, Ginsburg, Boris
Publikováno v:
Proceedings of Interspeech 2024
Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-te
Externí odkaz:
http://arxiv.org/abs/2407.03495
Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis
Externí odkaz:
http://arxiv.org/abs/2406.05298
Publikováno v:
WASPAA 2023
This paper proposes a flexible multichannel speech enhancement system with the main goal of improving robustness of automatic speech recognition (ASR) in noisy conditions. The proposed system combines a flexible neural mask estimator applicable to di
Externí odkaz:
http://arxiv.org/abs/2406.04552
The technology of autonomous driving is currently attracting a great deal of interest in both research and industry. In this paper, we present a deep learning dual-model solution that uses two deep neural networks for combined braking and steering in
Externí odkaz:
http://arxiv.org/abs/2405.06473
Autor:
Jukić, Josip, Šnajder, Jan
Enhancing generalization and uncertainty quantification in pre-trained language models (PLMs) is crucial for their effectiveness and reliability. Building on machine learning research that established the importance of robustness for improving genera
Externí odkaz:
http://arxiv.org/abs/2404.00758