Zobrazeno 1 - 10
of 33
pro vyhledávání: '"Silnova, Anna"'
Autor:
Pálka, Petr, Landini, Federico, Klement, Dominik, Diez, Mireia, Silnova, Anna, Delcroix, Marc, Burget, Lukáš
In spite of the popularity of end-to-end diarization systems nowadays, modular systems comprised of voice activity detection (VAD), speaker embedding extraction plus clustering, and overlapped speech detection (OSD) plus handling still attain competi
Externí odkaz:
http://arxiv.org/abs/2411.02165
End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM have shown promising performance on several downstrea
Externí odkaz:
http://arxiv.org/abs/2409.09408
Autor:
Rohdin, Johan, Zhang, Lin, Plchot, Oldřich, Staněk, Vojtěch, Mihola, David, Peng, Junyi, Stafylakis, Themos, Beveraki, Dmitriy, Silnova, Anna, Brukner, Jan, Burget, Lukáš
This paper describes the BUT submitted systems for the ASVspoof 5 challenge, along with analyses. For the conventional deepfake detection task, we use ResNet18 and self-supervised models for the closed and open conditions, respectively. In addition,
Externí odkaz:
http://arxiv.org/abs/2408.11152
Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improveme
Externí odkaz:
http://arxiv.org/abs/2406.12622
Autor:
Zhang, Lin, Stafylakis, Themos, Landini, Federico, Diez, Mireia, Silnova, Anna, Burget, Lukáš
In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what information is essential for the model. EEND-EDA utilizes attracto
Externí odkaz:
http://arxiv.org/abs/2402.19325
Autor:
Klement, Dominik, Diez, Mireia, Landini, Federico, Burget, Lukáš, Silnova, Anna, Delcroix, Marc, Tawara, Naohiro
Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA)
Externí odkaz:
http://arxiv.org/abs/2310.02732
Autor:
Delcroix, Marc, Tawara, Naohiro, Diez, Mireia, Landini, Federico, Silnova, Anna, Ogawa, Atsunori, Nakatani, Tomohiro, Burget, Lukas, Araki, Shoko
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an aud
Externí odkaz:
http://arxiv.org/abs/2305.13580
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher dist
Externí odkaz:
http://arxiv.org/abs/2210.15441
Autor:
Stafylakis, Themos, Mošner, Ladislav, Plchot, Oldřich, Rohdin, Johan, Silnova, Anna, Burget, Lukáš, Černocký, Jan "Honza''
In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge of the time
Externí odkaz:
http://arxiv.org/abs/2203.15436
Autor:
Brümmer, Niko, Swart, Albert, Mošner, Ladislav, Silnova, Anna, Plchot, Oldřich, Stafylakis, Themos, Burget, Lukáš
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA. Both have advantages and disadvantages, depending on the context. Cosine scoring fo
Externí odkaz:
http://arxiv.org/abs/2203.14893