Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Kim, Eungbeom"'
Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in fr
Externí odkaz:
http://arxiv.org/abs/2406.07909
Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR
Externí odkaz:
http://arxiv.org/abs/2305.13108
Autor:
Kim, Eungbeom, Kim, Jinhee, Oh, Yoori, Kim, Kyungsu, Park, Minju, Sim, Jaeheon, Lee, Jinwoo, Lee, Kyogu
In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance. We explore various augmentation methods at not only train-time but also test-time and find out
Externí odkaz:
http://arxiv.org/abs/2210.17143
Text-to-speech and voice conversion studies are constantly improving to the extent where they can produce synthetic speech almost indistinguishable from bona fide human speech. In this regard, the importance of countermeasures (CM) against synthetic
Externí odkaz:
http://arxiv.org/abs/2204.02639