Výsledky vyhledávání - "Zhao, Guoqing"

Report

Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

Autor: Chen, Zhigang, Zhou, Benjia, Li, Jun, Wan, Jun, Lei, Zhen, Jiang, Ning, Lu, Quan, Zhao, Guoqing

Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches wor

Externí odkaz: http://arxiv.org/abs/2403.12556

Zobrazit plný text záznamu

Report

SELM: Speech Enhancement Using Discrete Tokens and Language Models

Autor: Wang, Ziqian, Zhu, Xinfa, Zhang, Zihan, Lv, YuanJun, Jiang, Ning, Zhao, Guoqing, Xie, Lei

Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, h

Externí odkaz: http://arxiv.org/abs/2312.09747

Zobrazit plný text záznamu

Report

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

Autor: Zhu, Xinfa, Li, Yuke, Lei, Yi, Jiang, Ning, Zhao, Guoqing, Xie, Lei

This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across spe

Externí odkaz: http://arxiv.org/abs/2310.17101

Zobrazit plný text záznamu

Report

Multi-objective Progressive Clustering for Semi-supervised Domain Adaptation in Speaker Verification

Autor: Li, Ze, Lin, Yuke, Jiang, Ning, Qin, Xiaoyi, Zhao, Guoqing, Wu, Haiying, Li, Ming

Utilizing the pseudo-labeling algorithm with large-scale unlabeled data becomes crucial for semi-supervised domain adaptation in speaker verification tasks. In this paper, we propose a novel pseudo-labeling method named Multi-objective Progressive Cl

Externí odkaz: http://arxiv.org/abs/2310.04760

Zobrazit plný text záznamu

Report

Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification

Autor: Lin, Yuke, Qin, Xiaoyi, Jiang, Ning, Zhao, Guoqing, Li, Ming

It is widely acknowledged that discriminative representation for speaker verification can be extracted from verbal speech. However, how much speaker information that non-verbal vocalization carries is still a puzzle. This paper explores speaker verif

Externí odkaz: http://arxiv.org/abs/2309.14109

Zobrazit plný text záznamu

Report

The DKU-MSXF Speaker Verification System for the VoxCeleb Speaker Recognition Challenge 2023

Autor: Li, Ze, Lin, Yuke, Qin, Xiaoyi, Jiang, Ning, Zhao, Guoqing, Li, Ming

This paper is the system description of the DKU-MSXF System for the track1, track2 and track3 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). For Track 1, we utilize a network structure based on ResNet for training. By constructing a

Externí odkaz: http://arxiv.org/abs/2308.08766

Zobrazit plný text záznamu

Report

The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023

Autor: Cheng, Ming, Wang, Weiqing, Qin, Xiaoyi, Lin, Yuke, Jiang, Ning, Zhao, Guoqing, Li, Ming

This paper describes the DKU-MSXF submission to track 4 of the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). Our system pipeline contains voice activity detection, clustering-based diarization, overlapped speech detection, and target-speak

Externí odkaz: http://arxiv.org/abs/2308.07595

Zobrazit plný text záznamu

Report

VoxBlink: A Large Scale Speaker Verification Dataset on Camera

Autor: Lin, Yuke, Qin, Xiaoyi, Zhao, Guoqing, Cheng, Ming, Jiang, Ning, Wu, Haiyang, Li, Ming

In this paper, we introduce a large-scale and high-quality audio-visual speaker verification dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data mining pipeline to curate this dataset, which contains 1.45M utteran

Externí odkaz: http://arxiv.org/abs/2308.07056

Zobrazit plný text záznamu

Report

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

Autor: Song, Kun, lei, Yi, Chen, Peikun, Cao, Yiqing, Wei, Kun, Zhang, Yongmao, Xie, Lei, Jiang, Ning, Zhao, Guoqing

This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speec

Externí odkaz: http://arxiv.org/abs/2307.04630

Zobrazit plný text záznamu

Report

TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding

Autor: Liu, Zichen, Liu, Xuyuan, Wen, Yanlong, Zhao, Guoqing, Xia, Fen, Yuan, Xiaojie

ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to improve the effectiveness and efficiency of manual coding, many methods have

Externí odkaz: http://arxiv.org/abs/2305.18576

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání