Zobrazeno 1 - 10
of 164
pro vyhledávání: '"Keshet, Joseph"'
Large transformer-based models have significant potential for speech transcription and translation. Their self-attention mechanisms and parallel processing enable them to capture complex patterns and dependencies in audio sequences. However, this pot
Externí odkaz:
http://arxiv.org/abs/2409.15869
Integrating named entity recognition (NER) with automatic speech recognition (ASR) can significantly enhance transcription accuracy and informativeness. In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and
Externí odkaz:
http://arxiv.org/abs/2409.08107
Autor:
Turetzky, Arnon, Tal, Or, Segal-Feldman, Yael, Dissen, Yehoshua, Zeldes, Ella, Roth, Amit, Cohen, Eyal, Shrem, Yosi, Chernyak, Bronya R., Seleznova, Olga, Keshet, Joseph, Adi, Yossi
We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language. HebDB offers roughly 2500 hours of natural and spontaneous speech recordings in the Hebrew language, consisting of a large variety of speakers and to
Externí odkaz:
http://arxiv.org/abs/2407.07566
Publikováno v:
Interspeech 2024
Forced alignment (FA) plays a key role in speech research through the automatic time alignment of speech signals with corresponding text transcriptions. Despite the move towards end-to-end architectures for speech technology, FA is still dominantly a
Externí odkaz:
http://arxiv.org/abs/2406.19363
In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study i
Externí odkaz:
http://arxiv.org/abs/2406.18928
Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this pa
Externí odkaz:
http://arxiv.org/abs/2406.02649
Autor:
Eitan, Daniel, Pirchi, Menachem, Glazer, Neta, Meital, Shai, Ayach, Gil, Krendel, Gidon, Shamsian, Aviv, Navon, Aviv, Hetz, Gil, Keshet, Joseph
General purpose language models (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging
Externí odkaz:
http://arxiv.org/abs/2310.19708
Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a
Externí odkaz:
http://arxiv.org/abs/2310.01381
Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword
Externí odkaz:
http://arxiv.org/abs/2309.08561
Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribut
Externí odkaz:
http://arxiv.org/abs/2207.05418