Výsledky vyhledávání - "Moreno, Ignacio Lopez"

Report

Personalizing Keyword Spotting with Speaker Information

Autor: Labrador, Beltrán, Zhu, Pai, Zhao, Guanlong, Scarpati, Angelo Scorza, Wang, Quan, Lozano-Diez, Alicia, Park, Alex, Moreno, Ignacio López

Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Lin

Externí odkaz: http://arxiv.org/abs/2311.03419

Zobrazit plný text záznamu

Report

Locale Encoding For Scalable Multilingual Keyword Spotting Models

Autor: Zhu, Pai, Park, Hyun Jin, Park, Alex, Scarpati, Angelo Scorza, Moreno, Ignacio Lopez

A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales. Conventional monolingual KWSapproaches do not scale well to multilingual scenarios because ofhigh development/maintenance costs and lack of resource sharing.To

Externí odkaz: http://arxiv.org/abs/2302.12961

Zobrazit plný text záznamu

Report

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

Autor: Zhao, Guanlong, Wang, Quan, Lu, Han, Huang, Yiling, Moreno, Ignacio Lopez

In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the spars

Externí odkaz: http://arxiv.org/abs/2211.06482

Zobrazit plný text záznamu

Report

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

Autor: Labrador, Beltrán, Zhao, Guanlong, Moreno, Ignacio López, Scarpati, Angelo Scorza, Fowl, Liam, Wang, Quan

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token and training

Externí odkaz: http://arxiv.org/abs/2211.06478

Zobrazit plný text záznamu

Report

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

Autor: Wang, Quan, Huang, Yiling, Lu, Han, Zhao, Guanlong, Moreno, Ignacio Lopez

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems. In this paper, we demonstrate that a multi-st

Externí odkaz: http://arxiv.org/abs/2210.13690

Zobrazit plný text záznamu

Report

Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

Autor: Hard, Andrew, Partridge, Kurt, Chen, Neng, Augenstein, Sean, Shah, Aishanee, Park, Hyun Jin, Park, Alex, Ng, Sara, Nguyen, Jessica, Moreno, Ignacio Lopez, Mathews, Rajiv, Beaufays, Françoise

We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training cache

Externí odkaz: http://arxiv.org/abs/2204.06322

Zobrazit plný text záznamu

Report

Parameter-Free Attentive Scoring for Speaker Verification

Autor: Pelecanos, Jason, Wang, Quan, Huang, Yiling, Moreno, Ignacio Lopez

This paper presents a novel study of parameter-free attentive scoring for speaker verification. Parameter-free scoring provides the flexibility of comparing speaker representations without the need of an accompanying parametric scoring model. Inspire

Externí odkaz: http://arxiv.org/abs/2203.05642

Zobrazit plný text záznamu

Report

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

Autor: Wang, Quan, Yu, Yang, Pelecanos, Jason, Huang, Yiling, Moreno, Ignacio Lopez

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference

Externí odkaz: http://arxiv.org/abs/2202.12163

Zobrazit plný text záznamu

Report

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

Autor: Xia, Wei, Lu, Han, Wang, Quan, Tripathi, Anshuman, Huang, Yiling, Moreno, Ignacio Lopez, Sak, Hasim

In this paper, we present a novel speaker diarization system for streaming on-device applications. In this system, we use a transformer transducer to detect the speaker turns, represent each speaker turn by a speaker embedding, then cluster these emb

Externí odkaz: http://arxiv.org/abs/2109.11641

Zobrazit plný text záznamu

Report

Noisy student-teacher training for robust keyword spotting

Autor: Park, Hyun-Jin, Zhu, Pai, Moreno, Ignacio Lopez, Subrahmanya, Niranjan

We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation. The proposed method applies aggressive data augmentation (spectral augmentation

Externí odkaz: http://arxiv.org/abs/2106.01604

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání