Výsledky vyhledávání - "Keskin, Gokce"

Report

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

Autor: Keskin, Gokce, Wu, Minhua, King, Brian, Mallidi, Harish, Gao, Yang, Droppo, Jasha, Rastrow, Ariya, Maas, Roland

Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an

Externí odkaz: http://arxiv.org/abs/2106.02750

Zobrazit plný text záznamu

Report

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

Autor: Pulugundla, Bhargav, Gao, Yang, King, Brian, Keskin, Gokce, Mallidi, Harish, Wu, Minhua, Droppo, Jasha, Maas, Roland

Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution n

Externí odkaz: http://arxiv.org/abs/2105.05920

Zobrazit plný text záznamu

Report

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

Autor: Hu, Hu, Yang, Xuesong, Raeesy, Zeynab, Guo, Jinxi, Keskin, Gokce, Arsikere, Harish, Rastrow, Ariya, Stolcke, Andreas, Maas, Roland

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a

Externí odkaz: http://arxiv.org/abs/2012.07353

Zobrazit plný text záznamu

Report

Semi-supervised voice conversion with amortized variational inference

Autor: Stephenson, Cory, Keskin, Gokce, Thomas, Anil, Elibol, Oguz H.

Publikováno v: Proc. Interspeech 2019 (2019): 729-733

In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker. The proposed method makes use of both parallel and non-parallel utterances from

Externí odkaz: http://arxiv.org/abs/1910.00067

Zobrazit plný text záznamu

Report

Improving Branch Prediction By Modeling Global History with Convolutional Neural Networks

Autor: Tarsa, Stephen J, Lin, Chit-Kwan, Keskin, Gokce, Chinya, Gautham, Wang, Hong

CPU branch prediction has hit a wall--existing techniques achieve near-perfect accuracy on 99% of static branches, and yet the mispredictions that remain hide major performance gains. In a companion report, we show that a primary source of mispredict

Externí odkaz: http://arxiv.org/abs/1906.09889

Zobrazit plný text záznamu

Report

Measuring the Effectiveness of Voice Conversion on Speaker Identification and Automatic Speech Recognition Systems

Autor: Keskin, Gokce, Lee, Tyler, Stephenson, Cory, Elibol, Oguz H.

This paper evaluates the effectiveness of a Cycle-GAN based voice converter (VC) on four speaker identification (SID) systems and an automated speech recognition (ASR) system for various purposes. Audio samples converted by the VC model are classifie

Externí odkaz: http://arxiv.org/abs/1905.12531

Zobrazit plný text záznamu

Report

Semi-supervised and Population Based Training for Voice Commands Recognition

Autor: Elibol, Oguz H., Keskin, Gokce, Thomas, Anil

Publikováno v: ICASSP 2019

We present a rapid design methodology that combines automated hyper-parameter tuning with semi-supervised training to build highly accurate and robust models for voice commands classification. Proposed approach allows quick evaluation of network arch

Externí odkaz: http://arxiv.org/abs/1905.04230

Zobrazit plný text záznamu

Report

Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion

Autor: Ocal, Orhan, Elibol, Oguz H., Keskin, Gokce, Stephenson, Cory, Thomas, Anil, Ramchandran, Kannan

We present a method for converting the voices between a set of speakers. Our method is based on training multiple autoencoder paths, where there is a single speaker-independent encoder and multiple speaker-dependent decoders. The autoencoders are tra

Externí odkaz: http://arxiv.org/abs/1905.03864

Zobrazit plný text záznamu

Report

Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

Autor: Keskin, Gokce, Lee, Tyler, Stephenson, Cory, Elibol, Oguz H.

We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set. This property is enabled through speaker embeddings generated by a neural network that is jointly trained with the C

Externí odkaz: http://arxiv.org/abs/1905.02525

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání