Zobrazeno 1 - 10
of 43
pro vyhledávání: '"Hautamaki, Ville"'
Autor:
Singh, Vishwanath Pratap, Malato, Federico, Hautamaki, Ville, Sahidullah, Md., Kinnunen, Tomi
Publikováno v:
Interspeech 2024
While automatic speech recognition (ASR) greatly benefits from data augmentation, the augmentation recipes themselves tend to be heuristic. In this paper, we address one of the heuristic approach associated with balancing the right amount of augmente
Externí odkaz:
http://arxiv.org/abs/2406.09999
Autor:
Malato, Federico, Hautamaki, Ville
Imitation learning enables autonomous agents to learn from human examples, without the need for a reward signal. Still, if the provided dataset does not encapsulate the task correctly, or when the task is too complex to be modeled, such agents fail t
Externí odkaz:
http://arxiv.org/abs/2406.04913
Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a de
Externí odkaz:
http://arxiv.org/abs/2401.16398
Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experi
Externí odkaz:
http://arxiv.org/abs/2306.09082
Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise. However, excessive suppression may lead to speech distortion and speaker information loss, which degrades the performance of speake
Externí odkaz:
http://arxiv.org/abs/2110.00940
VoxCeleb datasets are widely used in speaker recognition studies. Our work serves two purposes. First, we provide speaker age labels and (an alternative) annotation of speaker gender. Second, we demonstrate the use of this metadata by constructing ag
Externí odkaz:
http://arxiv.org/abs/2109.13510
In recent years, transformer models have achieved great success in natural language processing (NLP) tasks. Most of the current state-of-the-art NLP results are achieved by using monolingual transformer models, where the model is pre-trained using a
Externí odkaz:
http://arxiv.org/abs/2006.07698
Mapping states to actions in deep reinforcement learning is mainly based on visual information. The commonly used approach for dealing with visual information is to extract pixels from images and use them as state representation for reinforcement lea
Externí odkaz:
http://arxiv.org/abs/1905.04192
Autor:
Lee, Kong Aik, Hautamaki, Ville, Kinnunen, Tomi, Yamamoto, Hitoshi, Okabe, Koji, Vestman, Ville, Huang, Jing, Ding, Guohong, Sun, Hanwu, Larcher, Anthony, Das, Rohan Kumar, Li, Haizhou, Rouvier, Mickael, Bousquet, Pierre-Michel, Rao, Wei, Wang, Qing, Zhang, Chunlei, Bahmaninezhad, Fahimeh, Delgado, Hector, Patino, Jose, Wang, Qiongqiong, Guo, Ling, Koshinaka, Takafumi, Zhang, Jiacen, Shinoda, Koichi, Trong, Trung Ngo, Sahidullah, Md, Lu, Fan, Tang, Yun, Tu, Ming, Teh, Kah Kuan, Tran, Huy Dat, George, Kuruvachan K., Kukanov, Ivan, Desnous, Florent, Yang, Jichen, Yilmaz, Emre, Xu, Longting, Bonastre, Jean-Francois, Xu, Chenglin, Lim, Zhi Hao, Chng, Eng Siong, Ranjan, Shivesh, Hansen, John H. L., Todisco, Massimiliano, Evans, Nicholas
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also
Externí odkaz:
http://arxiv.org/abs/1904.07386
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.