Výsledky vyhledávání - "Keshet, Joseph"

Report

Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR

Autor: Segal-Feldman, Yael, Shamsian, Aviv, Navon, Aviv, Hetz, Gill, Keshet, Joseph

Large transformer-based models have significant potential for speech transcription and translation. Their self-attention mechanisms and parallel processing enable them to capture complex patterns and dependencies in audio sequences. However, this pot

Externí odkaz: http://arxiv.org/abs/2409.15869

Zobrazit plný text záznamu

Report

WhisperNER: Unified Open Named Entity and Speech Recognition

Autor: Ayache, Gil, Pirchi, Menachem, Navon, Aviv, Shamsian, Aviv, Hetz, Gill, Keshet, Joseph

Integrating named entity recognition (NER) with automatic speech recognition (ASR) can significantly enhance transcription accuracy and informativeness. In this paper, we introduce WhisperNER, a novel model that allows joint speech transcription and

Externí odkaz: http://arxiv.org/abs/2409.08107

Zobrazit plný text záznamu

Report

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing

Autor: Turetzky, Arnon, Tal, Or, Segal-Feldman, Yael, Dissen, Yehoshua, Zeldes, Ella, Roth, Amit, Cohen, Eyal, Shrem, Yosi, Chernyak, Bronya R., Seleznova, Olga, Keshet, Joseph, Adi, Yossi

We present HebDB, a weakly supervised dataset for spoken language processing in the Hebrew language. HebDB offers roughly 2500 hours of natural and spontaneous speech recordings in the Hebrew language, consisting of a large variety of speakers and to

Externí odkaz: http://arxiv.org/abs/2407.07566

Zobrazit plný text záznamu

Report

Tradition or Innovation: A Comparison of Modern ASR Methods for Forced Alignment

Autor: Rousso, Rotem, Cohen, Eyal, Keshet, Joseph, Chodroff, Eleanor

Publikováno v: Interspeech 2024

Forced alignment (FA) plays a key role in speech research through the automatic time alignment of speech signals with corresponding text transcriptions. Despite the move towards end-to-end architectures for speech technology, FA is still dominantly a

Externí odkaz: http://arxiv.org/abs/2406.19363

Zobrazit plný text záznamu

Report

Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network

Autor: Dissen, Yehoshua, Yonash, Shiry, Cohen, Israel, Keshet, Joseph

In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study i

Externí odkaz: http://arxiv.org/abs/2406.18928

Zobrazit plný text záznamu

Report

Keyword-Guided Adaptation of Automatic Speech Recognition

Autor: Shamsian, Aviv, Navon, Aviv, Glazer, Neta, Hetz, Gill, Keshet, Joseph

Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this pa

Externí odkaz: http://arxiv.org/abs/2406.02649

Zobrazit plný text záznamu

Report

Combining Language Models For Specialized Domains: A Colorful Approach

Autor: Eitan, Daniel, Pirchi, Menachem, Glazer, Neta, Meital, Shai, Ayach, Gil, Krendel, Gidon, Shamsian, Aviv, Navon, Aviv, Hetz, Gil, Keshet, Joseph

General purpose language models (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging

Externí odkaz: http://arxiv.org/abs/2310.19708

Zobrazit plný text záznamu

Report

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

Autor: Benita, Roi, Elad, Michael, Keshet, Joseph

Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a

Externí odkaz: http://arxiv.org/abs/2310.01381

Zobrazit plný text záznamu

Report

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Autor: Navon, Aviv, Shamsian, Aviv, Glazer, Neta, Hetz, Gill, Keshet, Joseph

Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword

Externí odkaz: http://arxiv.org/abs/2309.08561

Zobrazit plný text záznamu

Report

A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

Autor: Shalev, Gabi, Shalev, Gal-Lev, Keshet, Joseph

Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribut

Externí odkaz: http://arxiv.org/abs/2207.05418

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání