Zobrazeno 1 - 10
of 21
pro vyhledávání: '"Pelecanos, Jason"'
Autor:
Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan
We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and uns
Externí odkaz:
http://arxiv.org/abs/2309.08023
This paper presents a novel study of parameter-free attentive scoring for speaker verification. Parameter-free scoring provides the flexibility of comparing speaker representations without the need of an accompanying parametric scoring model. Inspire
Externí odkaz:
http://arxiv.org/abs/2203.05642
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference
Externí odkaz:
http://arxiv.org/abs/2202.12163
Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle utterance-specific unce
Externí odkaz:
http://arxiv.org/abs/2104.01989
In this paper, we describe SpeakerStew - a hybrid system to perform speaker verification on 46 languages. Two core ideas were explored in this system: (1) Pooling training data of different languages together for multilingual generalization and reduc
Externí odkaz:
http://arxiv.org/abs/2104.02125
In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data. Correspondingly, we investigate the use of a multi-speaker TTS system to synthesize spee
Externí odkaz:
http://arxiv.org/abs/2011.11818
Autor:
Wang, Quan, Moreno, Ignacio Lopez, Saglam, Mert, Wilson, Kevin, Chiao, Alan, Liu, Renjie, He, Yanzhang, Li, Wei, Pelecanos, Jason, Nika, Marily, Gruenstein, Alexander
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challe
Externí odkaz:
http://arxiv.org/abs/2009.04323
We present the recent advances along with an error analysis of the IBM speaker recognition system for conversational speech. Some of the key advancements that contribute to our system include: a nearest-neighbor discriminant analysis (NDA) approach (
Externí odkaz:
http://arxiv.org/abs/1605.01635
In this paper we describe the recent advancements made in the IBM i-vector speaker recognition system for conversational speech. In particular, we identify key techniques that contribute to significant improvements in performance of our system, and q
Externí odkaz:
http://arxiv.org/abs/1602.07291
Publikováno v:
2016 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP); 2016, p5040-5044, 5p