Zobrazeno 1 - 10
of 170
pro vyhledávání: '"Higuchi Takuya"'
Autor:
Aldeneh, Zakaria, Higuchi, Takuya, Jung, Jee-weon, Chen, Li-Wei, Shum, Stephen, Abdelaziz, Ahmed Hussen, Watanabe, Shinji, Likhomanenko, Tatiana, Theobald, Barry-John
Iterative self-training, or iterative pseudo-labeling (IPL)--using an improved model from the current iteration to provide pseudo-labels for the next iteration--has proven to be a powerful approach to enhance the quality of speaker representations. R
Externí odkaz:
http://arxiv.org/abs/2409.10791
Autor:
Chen, Li-Wei, Higuchi, Takuya, Bai, He, Abdelaziz, Ahmed Hussen, Rudnicky, Alexander, Watanabe, Shinji, Likhomanenko, Tatiana, Theobald, Barry-John, Aldeneh, Zakaria
Speech foundation models, such as HuBERT and its variants, are pre-trained on large amounts of unlabeled speech for various downstream tasks. These models use a masked prediction objective, where the model learns to predict information about masked i
Externí odkaz:
http://arxiv.org/abs/2409.10788
Autor:
Aldeneh, Zakaria, Thilak, Vimal, Higuchi, Takuya, Theobald, Barry-John, Likhomanenko, Tatiana
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the performance of these encoders is resource-intensive and require
Externí odkaz:
http://arxiv.org/abs/2409.10787
Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short
Externí odkaz:
http://arxiv.org/abs/2404.04439
Autor:
Aldeneh, Zakaria, Higuchi, Takuya, Jung, Jee-weon, Seto, Skyler, Likhomanenko, Tatiana, Shum, Stephen, Abdelaziz, Ahmed Hussen, Watanabe, Shinji, Theobald, Barry-John
Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-bank features as inputs, and thus, training them on top of self-supervised fe
Externí odkaz:
http://arxiv.org/abs/2402.00340
Autor:
Jung, Jee-weon, Zhang, Wangyou, Shi, Jiatong, Aldeneh, Zakaria, Higuchi, Takuya, Theobald, Barry-John, Abdelaziz, Ahmed Hussen, Watanabe, Shinji
This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We pr
Externí odkaz:
http://arxiv.org/abs/2401.17230
Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still under-investigated. In
Externí odkaz:
http://arxiv.org/abs/2309.16060
Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals. Since conv
Externí odkaz:
http://arxiv.org/abs/2309.16036
Autor:
Nayak, Prateeth, Higuchi, Takuya, Gupta, Anmol, Ranjan, Shivesh, Shum, Stephen, Sigtia, Siddharth, Marchi, Erik, Lakshminarasimhan, Varun, Cho, Minsik, Adya, Saurabh, Dhir, Chandra, Tewfik, Ahmed
Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger dete
Externí odkaz:
http://arxiv.org/abs/2204.02455
Keyword spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase. Although a phoneme classifier can be used for KWS, exploiting a large amount of transcribed data for autom
Externí odkaz:
http://arxiv.org/abs/2107.07634