Výsledky vyhledávání - "Cheng, Luyao"

Report

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Autor: Cheng, Luyao, Wang, Hui, Zheng, Siqi, Chen, Yafeng, Huang, Rongjie, Zhang, Qinglin, Chen, Qian, Li, Xihao

Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarizat

Externí odkaz: http://arxiv.org/abs/2408.12102

Zobrazit plný text záznamu

Report

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Autor: Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Chen, Qian, Zhang, Shiliang, Wang, Wen

Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Networ

Externí odkaz: http://arxiv.org/abs/2406.11169

Zobrazit plný text záznamu

Report

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Autor: Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Chen, Qian, Zhang, Shiliang, Li, Junjie

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteri

Externí odkaz: http://arxiv.org/abs/2406.02167

Zobrazit plný text záznamu

Report

3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization

Autor: Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Zhu, Tinglong, Huang, Rongjie, Deng, Chong, Chen, Qian, Zhang, Shiliang, Wang, Wen, Li, Xihao

We introduce 3D-Speaker-Toolkit, an open-source toolkit for multimodal speaker verification and diarization, designed for meeting the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined st

Externí odkaz: http://arxiv.org/abs/2403.19971

Zobrazit plný text záznamu

Report

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Autor: Cheng, Luyao, Zheng, Siqi, Zhang, Qinglin, Wang, Hui, Chen, Yafeng, Chen, Qian, Zhang, Shiliang

Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of se

Externí odkaz: http://arxiv.org/abs/2309.10456

Zobrazit plný text záznamu

Report

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Autor: Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Chen, Qian, Zhang, Shiliang, Wang, Wen

Externí odkaz: http://arxiv.org/abs/2308.02774

Zobrazit plný text záznamu

Report

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Autor: Zheng, Siqi, Cheng, Luyao, Chen, Yafeng, Wang, Hui, Chen, Qian

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated inf

Externí odkaz: http://arxiv.org/abs/2306.15354

Zobrazit plný text záznamu

Report

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Autor: Cheng, Luyao, Zheng, Siqi, Qinglin, Zhang, Wang, Hui, Chen, Yafeng, Chen, Qian

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performan

Externí odkaz: http://arxiv.org/abs/2305.12927

Zobrazit plný text záznamu

Report

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

Autor: Chen, Yafeng, Zheng, Siqi, Wang, Hui, Cheng, Luyao, Chen, Qian, Qi, Jiajun

Effective fusion of multi-scale features is crucial for improving speaker verification performance. While most existing methods aggregate multi-scale features in a layer-wise manner via simple operations, such as summation or concatenation. This pape

Externí odkaz: http://arxiv.org/abs/2305.12838

Zobrazit plný text záznamu

Report

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Autor: Wang, Hui, Zheng, Siqi, Chen, Yafeng, Cheng, Luyao, Chen, Qian

Time delay neural network (TDNN) has been proven to be efficient for speaker verification. One of its successful variants, ECAPA-TDNN, achieved state-of-the-art performance at the cost of much higher computational complexity and slower inference spee

Externí odkaz: http://arxiv.org/abs/2303.00332

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání