Výsledky vyhledávání

Report

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Autor: Kang, Wonjune, Jia, Junteng, Wu, Chunyang, Zhou, Wei, Lakomkin, Egor, Gaur, Yashesh, Sari, Leda, Kim, Suyoun, Li, Ke, Mahadeokar, Jay, Kalinli, Ozlem

As speech becomes an increasingly common modality for interacting with large language models (LLMs), it is becoming desirable to develop systems where LLMs can take into account users' emotions or speaking styles when providing their responses. In th

Externí odkaz: http://arxiv.org/abs/2410.01162

Zobrazit plný text záznamu

Report

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Autor: Xie, Jiamin, Li, Ke, Guo, Jinxi, Tjandra, Andros, Shangguan, Yuan, Sari, Leda, Wu, Chunyang, Jia, Junteng, Mahadeokar, Jay, Kalinli, Ozlem

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language.

Externí odkaz: http://arxiv.org/abs/2309.13018

Zobrazit plný text záznamu

Report

Augmenting text for spoken language understanding with Large Language Models

Autor: Sharma, Roshan, Kim, Suyoun, Lazar, Daniel, Le, Trang, Shrivastava, Akshat, Ahn, Kwanghoon, Kansal, Piyush, Sari, Leda, Kalinli, Ozlem, Seltzer, Michael

Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of spe

Externí odkaz: http://arxiv.org/abs/2309.09390

Zobrazit plný text záznamu

Report

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Autor: Le, Matthew, Vyas, Apoorv, Shi, Bowen, Karrer, Brian, Sari, Leda, Moritz, Rashel, Williamson, Mary, Manohar, Vimal, Adi, Yossi, Mahadeokar, Jay, Hsu, Wei-Ning

Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generativ

Externí odkaz: http://arxiv.org/abs/2306.15687

Zobrazit plný text záznamu

Report

Towards Selection of Text-to-speech Data to Augment ASR Training

Autor: Liu, Shuo, Sarı, Leda, Wu, Chunyang, Keren, Gil, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model. We trained a neural network, which can be

Externí odkaz: http://arxiv.org/abs/2306.00998

Zobrazit plný text záznamu

Report

Self-Supervised Representations for Singing Voice Conversion

Autor: Jayashankar, Tejas, Wu, Jilong, Sari, Leda, Kant, David, Manohar, Vimal, He, Qing

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped further the

Externí odkaz: http://arxiv.org/abs/2303.12197

Zobrazit plný text záznamu

Report

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

Autor: Klumpp, Philipp, Chitkara, Pooja, Sarı, Leda, Serai, Prashant, Wu, Jilong, Veliche, Irina-Elena, Huang, Rongqing, He, Qing

The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion mo

Externí odkaz: http://arxiv.org/abs/2303.00802

Zobrazit plný text záznamu

Report

Biased Self-supervised learning for ASR

Autor: Kreyssig, Florian L., Shi, Yangyang, Guo, Jinxi, Sari, Leda, Mohamed, Abdelrahman, Woodland, Philip C.

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slig

Externí odkaz: http://arxiv.org/abs/2211.02536

Zobrazit plný text záznamu

Report

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Autor: Liu, Chunxi, Picheny, Michael, Sarı, Leda, Chitkara, Pooja, Xiao, Alex, Zhang, Xiaohui, Chou, Mark, Alvarado, Andres, Hazirbas, Caner, Saraf, Yatharth

It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper

Externí odkaz: http://arxiv.org/abs/2111.09983

Zobrazit plný text záznamu

Report

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Autor: Grauman, Kristen, Westbury, Andrew, Byrne, Eugene, Chavis, Zachary, Furnari, Antonino, Girdhar, Rohit, Hamburger, Jackson, Jiang, Hao, Liu, Miao, Liu, Xingyu, Martin, Miguel, Nagarajan, Tushar, Radosavovic, Ilija, Ramakrishnan, Santhosh Kumar, Ryan, Fiona, Sharma, Jayant, Wray, Michael, Xu, Mengmeng, Xu, Eric Zhongcong, Zhao, Chen, Bansal, Siddhant, Batra, Dhruv, Cartillier, Vincent, Crane, Sean, Do, Tien, Doulaty, Morrie, Erapalli, Akshay, Feichtenhofer, Christoph, Fragomeni, Adriano, Fu, Qichen, Gebreselasie, Abrham, Gonzalez, Cristina, Hillis, James, Huang, Xuhua, Huang, Yifei, Jia, Wenqi, Khoo, Weslie, Kolar, Jachym, Kottur, Satwik, Kumar, Anurag, Landini, Federico, Li, Chao, Li, Yanghao, Li, Zhenqiang, Mangalam, Karttikeya, Modhugu, Raghava, Munro, Jonathan, Murrell, Tullie, Nishiyasu, Takumi, Price, Will, Puentes, Paola Ruiz, Ramazanova, Merey, Sari, Leda, Somasundaram, Kiran, Southerland, Audrey, Sugano, Yusuke, Tao, Ruijie, Vo, Minh, Wang, Yuchen, Wu, Xindi, Yagi, Takuma, Zhao, Ziwei, Zhu, Yunyi, Arbelaez, Pablo, Crandall, David, Damen, Dima, Farinella, Giovanni Maria, Fuegen, Christian, Ghanem, Bernard, Ithapu, Vamsi Krishna, Jawahar, C. V., Joo, Hanbyul, Kitani, Kris, Li, Haizhou, Newcombe, Richard, Oliva, Aude, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Shi, Jianbo, Shou, Mike Zheng, Torralba, Antonio, Torresani, Lorenzo, Yan, Mingfei, Malik, Jitendra

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers f

Externí odkaz: http://arxiv.org/abs/2110.07058

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání