Zobrazeno 1 - 10
of 6 358
pro vyhledávání: '"Umbach A"'
We propose an approach for simultaneous diarization and separation of meeting data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM) for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for diarization in
Externí odkaz:
http://arxiv.org/abs/2410.21455
Speech signals encompass various information across multiple levels including content, speaker, and style. Disentanglement of these information, although challenging, is important for applications such as voice conversion. The contrastive predictive
Externí odkaz:
http://arxiv.org/abs/2409.03520
The room impulse response (RIR) encodes, among others, information about the distance of an acoustic source from the sensors. Deep neural networks (DNNs) have been shown to be able to extract that information for acoustic distance estimation. Since t
Externí odkaz:
http://arxiv.org/abs/2408.14213
Autor:
Wei, Miranda, Consolvo, Sunny, Kelley, Patrick Gage, Kohno, Tadayoshi, Matthews, Tara, Meiklejohn, Sarah, Roesner, Franziska, Shelby, Renee, Thomas, Kurt, Umbach, Rebecca
Publikováno v:
Proceedings of the 33rd USENIX Security Symposium (USENIX Security 2024)
Image-based sexual abuse (IBSA), like other forms of technology-facilitated abuse, is a growing threat to people's digital safety. Attacks include unwanted solicitations for sexually explicit images, extorting people under threat of leaking their ima
Externí odkaz:
http://arxiv.org/abs/2406.12161
Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlapping or noisy speech, these systems have
Externí odkaz:
http://arxiv.org/abs/2406.03155
Deepfake technologies have become ubiquitous, "democratizing" the ability to manipulate photos and videos. One popular use of deepfake technology is the creation of sexually explicit content, which can then be posted and shared widely on the internet
Externí odkaz:
http://arxiv.org/abs/2402.01721
Autor:
Cord-Landwehr, Tobias, Boeddeker, Christoph, Zorilă, Cătălin, Doddipatla, Rama, Haeb-Umbach, Reinhold
We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlapping speech. To this end, a geodesic distance loss is used th
Externí odkaz:
http://arxiv.org/abs/2401.03963
Autor:
Rautenberg, Frederik, Kuhlmann, Michael, Wiechmann, Jana, Seebauer, Fritz, Wagner, Petra, Haeb-Umbach, Reinhold
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly nam
Externí odkaz:
http://arxiv.org/abs/2310.12599
Autor:
von Neumann, Thilo, Boeddeker, Christoph, Cord-Landwehr, Tobias, Delcroix, Marc, Haeb-Umbach, Reinhold
We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet separation archite
Externí odkaz:
http://arxiv.org/abs/2309.16482
Autor:
Vieting, Peter, Berger, Simon, von Neumann, Thilo, Boeddeker, Christoph, Schlüter, Ralf, Haeb-Umbach, Reinhold
Many real-life applications of automatic speech recognition (ASR) require processing of overlapped speech. A commonmethod involves first separating the speech into overlap-free streams and then performing ASR on the resulting signals. Recently, the i
Externí odkaz:
http://arxiv.org/abs/2309.08454