Zobrazeno 1 - 10
of 4 870
pro vyhledávání: '"Speaker diarization"'
In this paper, we propose a quality-aware end-to-end audio-visual neural speaker diarization framework, which comprises three key techniques. First, our audio-visual model takes both audio and visual features as inputs, utilizing a series of binary c
Externí odkaz:
http://arxiv.org/abs/2410.22350
Autor:
Plaquet, Alexis, Tawara, Naohiro, Delcroix, Marc, Horiguchi, Shota, Ando, Atsushi, Araki, Shoko
Mamba is a newly proposed architecture which behaves like a recurrent neural network (RNN) with attention-like capabilities. These properties are promising for speaker diarization, as attention-based models have unsuitable memory requirements for lon
Externí odkaz:
http://arxiv.org/abs/2410.06459
Although fully end-to-end speaker diarization systems have made significant progress in recent years, modular systems often achieve superior results in real-world scenarios due to their greater adaptability and robustness. Historically, modular speak
Externí odkaz:
http://arxiv.org/abs/2409.16803
Autor:
Plaquet, Alexis, Bredin, Hervé
Publikováno v:
Interspeech 2024, Sep 2024, Kos, Greece. pp.3764-3768
End-to-end neural diarization models have usually relied on a multilabel-classification formulation of the speaker diarization problem. Recently, we proposed a powerset multiclass formulation that has beaten the state-of-the-art on multiple datasets.
Externí odkaz:
http://arxiv.org/abs/2409.15885
Automating child speech analysis is crucial for applications such as neurocognitive assessments. Speaker diarization, which identifies ``who spoke when'', is an essential component of the automated analysis. However, publicly available child-adult sp
Externí odkaz:
http://arxiv.org/abs/2409.08881
Autor:
Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris
We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe
Externí odkaz:
http://arxiv.org/abs/2409.06656
Nowadays, the large amount of audio-visual content available has fostered the need to develop new robust automatic speaker diarization systems to analyse and characterise it. This kind of system helps to reduce the cost of doing this process manually
Externí odkaz:
http://arxiv.org/abs/2409.05659
Autor:
Cheng, Luyao, Wang, Hui, Zheng, Siqi, Chen, Yafeng, Huang, Rongjie, Zhang, Qinglin, Chen, Qian, Li, Xihao
Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarizat
Externí odkaz:
http://arxiv.org/abs/2408.12102
Speaker diarization answers the question "who spoke when" for an audio file. In some diarization scenarios, low latency is required for transcription. Speaker diarization with low latency is referred to as online speaker diarization. The DIART pipeli
Externí odkaz:
http://arxiv.org/abs/2408.02341
Autor:
Tao, Ruijie, Shi, Zhan, Jiang, Yidi, Truong, Duc-Tuan, Chng, Eng-Siong, Alioto, Massimo, Li, Haizhou
The human brain has the capability to associate the unknown person's voice and face by leveraging their general relationship, referred to as ``cross-modal speaker verification''. This task poses significant challenges due to the complex relationship
Externí odkaz:
http://arxiv.org/abs/2407.17902