Výsledky vyhledávání

Report

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Autor: Geng, Mengzhe, Xie, Xurong, Deng, Jiajun, Jin, Zengrui, Li, Guinan, Wang, Tianzi, Hu, Shujie, Li, Zhaoqing, Meng, Helen, Liu, Xunying

The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this en

Externí odkaz: http://arxiv.org/abs/2407.06310

Zobrazit plný text záznamu

Report

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

Autor: Hu, Shujie, Xie, Xurong, Geng, Mengzhe, Jin, Zengrui, Deng, Jiajun, Li, Guinan, Wang, Yi, Cui, Mingyu, Wang, Tianzi, Meng, Helen, Liu, Xunying

Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcit

Externí odkaz: http://arxiv.org/abs/2407.13782

Zobrazit plný text záznamu

Report

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Autor: Li, Guinan, Deng, Jiajun, Chen, Youjun, Geng, Mengzhe, Hu, Shujie, Li, Zhe, Jin, Zengrui, Wang, Tianzi, Xie, Xurong, Meng, Helen, Liu, Xunying

This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and ti

Externí odkaz: http://arxiv.org/abs/2406.10152

Zobrazit plný text záznamu

Report

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

Autor: Wang, Tianzi, Xie, Xurong, Li, Zhaoqing, Hu, Shoukang, Jin, Zengrui, Deng, Jiajun, Cui, Mingyu, Hu, Shujie, Geng, Mengzhe, Li, Guinan, Meng, Helen, Liu, Xunying

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output l

Externí odkaz: http://arxiv.org/abs/2406.10034

Zobrazit plný text záznamu

Report

SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation

Autor: Cai, Changpeng, Guo, Guinan, Li, Jiao, Su, Junhao, Shen, Fei, He, Chenghao, Xiao, Jing, Chen, Yuanxu, Dai, Lei, Zhu, Feiyu

Most earlier researches on talking face generation have focused on the synchronization of lip motion and speech content. However, head pose and facial emotions are equally important characteristics of natural faces. While audio-driven talking face ge

Externí odkaz: http://arxiv.org/abs/2405.07257

Zobrazit plný text záznamu

Report

A community palm model

Palm oil production has been identified as one of the major drivers of deforestation for tropical countries. To meet supply chain objectives, commodity producers and other stakeholders need timely information of land cover dynamics in their supply sh

Externí odkaz: http://arxiv.org/abs/2405.09530

Zobrazit plný text záznamu

Report

BrainMass: Advancing Brain Network Analysis for Diagnosis with Large-scale Self-Supervised Learning

Autor: Yang, Yanwu, Ye, Chenfei, Su, Guinan, Zhang, Ziyao, Chang, Zhikai, Chen, Hairui, Chan, Piu, Yu, Yue, Ma, Ting

Foundation models pretrained on large-scale datasets via self-supervised learning demonstrate exceptional versatility across various tasks. Due to the heterogeneity and hard-to-collect medical data, this approach is especially beneficial for medical

Externí odkaz: http://arxiv.org/abs/2403.01433

Zobrazit plný text záznamu

Report

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Autor: Wang, Huimeng, Jin, Zengrui, Geng, Mengzhe, Hu, Shujie, Li, Guinan, Wang, Tianzi, Xu, Haoning, Liu, Xunying

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained

Externí odkaz: http://arxiv.org/abs/2401.00662

Zobrazit plný text záznamu

Report

Towards Automatic Data Augmentation for Disordered Speech Recognition

Autor: Jin, Zengrui, Xie, Xurong, Wang, Tianzi, Geng, Mengzhe, Deng, Jiajun, Li, Guinan, Hu, Shujie, Liu, Xunying

Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and en

Externí odkaz: http://arxiv.org/abs/2312.08641

Zobrazit plný text záznamu

Report

Prompt Your Mind: Refine Personalized Text Prompts within Your Mind

Autor: Su, Guinan, Yang, Yanwu, Guo, Jie

Large language models (LLMs) have demonstrated remarkable potential in natural language understanding and generation, making them valuable tools for enhancing conversational interactions. However, LLMs encounter challenges such as lacking multi-step

Externí odkaz: http://arxiv.org/abs/2311.05114

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání