Výsledky vyhledávání - "Travadi, Ruchir"

Report

Optimizing Byte-level Representation for End-to-end ASR

Autor: Hsiao, Roger, Deng, Liuhui, McDermott, Erik, Travadi, Ruchir, Zhuang, Xiaodan

We propose a novel approach to optimizing a byte-level representation for end-to-end automatic speech recognition (ASR). Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages

Externí odkaz: http://arxiv.org/abs/2406.09676

Zobrazit plný text záznamu

Report

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Autor: Lei, Zhihong, Pusateri, Ernest, Han, Shiyi, Liu, Leo, Xu, Mingbin, Ng, Tim, Travadi, Ruchir, Zhang, Youyuan, Hannemann, Mirko, Siu, Man-Hung, Huang, Zhen

Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our person

Externí odkaz: http://arxiv.org/abs/2310.09988

Zobrazit plný text záznamu

Report

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Autor: Swietojanski, Pawel, Braun, Stefan, Can, Dogan, da Silva, Thiago Fraga, Ghoshal, Arnab, Hori, Takaaki, Hsiao, Roger, Mason, Henry, McDermott, Erik, Silovsky, Honza, Travadi, Ruchir, Zhuang, Xiaodan

Publikováno v: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, wher

Externí odkaz: http://arxiv.org/abs/2211.01438

Zobrazit plný text záznamu

Report

Online Automatic Speech Recognition with Listen, Attend and Spell Model

Autor: Hsiao, Roger, Can, Dogan, Ng, Tim, Travadi, Ruchir, Ghoshal, Arnab

The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that

Externí odkaz: http://arxiv.org/abs/2008.05514

Zobrazit plný text záznamu

Report

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

Autor: Dhawan, Kunal, Vaz, Colin, Travadi, Ruchir, Narayanan, Shrikanth

We propose an algorithm to extract noise-robust acoustic features from noisy speech. We use Total Variability Modeling in combination with Non-negative Matrix Factorization (NMF) to learn a total variability subspace and adapt NMF dictionaries for ea

Externí odkaz: http://arxiv.org/abs/1907.06859

Zobrazit plný text záznamu

Report

Multimodal Representation Learning using Deep Multiset Canonical Correlation

Autor: Somandepalli, Krishna, Kumar, Naveen, Travadi, Ruchir, Narayanan, Shrikanth

We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities. We use deep learning framework to learn non-linear

Externí odkaz: http://arxiv.org/abs/1904.01775

Zobrazit plný text záznamu