Výsledky vyhledávání

Report

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

Autor: Cord-Landwehr, Tobias, Boeddeker, Christoph, Haeb-Umbach, Reinhold

We propose an approach for simultaneous diarization and separation of meeting data. It consists of a complex Angular Central Gaussian Mixture Model (cACGMM) for speech source separation, and a von-Mises-Fisher Mixture Model (VMFMM) for diarization in

Externí odkaz: http://arxiv.org/abs/2410.21455

Zobrazit plný text záznamu

Report

Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder

Autor: Xie, Yuying, Kuhlmann, Michael, Rautenberg, Frederik, Tan, Zheng-Hua, Haeb-Umbach, Reinhold

Speech signals encompass various information across multiple levels including content, speaker, and style. Disentanglement of these information, although challenging, is important for applications such as voice conversion. The contrastive predictive

Externí odkaz: http://arxiv.org/abs/2409.03520

Zobrazit plný text záznamu

Report

Diminishing Domain Mismatch for DNN-Based Acoustic Distance Estimation via Stochastic Room Reverberation Models

Autor: Gburrek, Tobias, Meise, Adrian, Schmalenstroeer, Joerg, Haeb-Umbach, Reinhold

The room impulse response (RIR) encodes, among others, information about the distance of an acoustic source from the sensors. Deep neural networks (DNNs) have been shown to be able to extract that information for acoustic distance estimation. Since t

Externí odkaz: http://arxiv.org/abs/2408.14213

Zobrazit plný text záznamu

Report

Understanding Help-Seeking and Help-Giving on Social Media for Image-Based Sexual Abuse

Autor: Wei, Miranda, Consolvo, Sunny, Kelley, Patrick Gage, Kohno, Tadayoshi, Matthews, Tara, Meiklejohn, Sarah, Roesner, Franziska, Shelby, Renee, Thomas, Kurt, Umbach, Rebecca

Publikováno v: Proceedings of the 33rd USENIX Security Symposium (USENIX Security 2024)

Image-based sexual abuse (IBSA), like other forms of technology-facilitated abuse, is a growing threat to people's digital safety. Attacks include unwanted solicitations for sexually explicit images, extorting people under threat of leaking their ima

Externí odkaz: http://arxiv.org/abs/2406.12161

Zobrazit plný text záznamu

Report

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

Autor: Boeddeker, Christoph, Cord-Landwehr, Tobias, Haeb-Umbach, Reinhold

Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlapping or noisy speech, these systems have

Externí odkaz: http://arxiv.org/abs/2406.03155

Zobrazit plný text záznamu

Report

Non-Consensual Synthetic Intimate Imagery: Prevalence, Attitudes, and Knowledge in 10 Countries

Autor: Umbach, Rebecca, Henry, Nicola, Beard, Gemma, Berryessa, Colleen

Deepfake technologies have become ubiquitous, "democratizing" the ability to manipulate photos and videos. One popular use of deepfake technology is the creation of sexually explicit content, which can then be posted and shared widely on the internet

Externí odkaz: http://arxiv.org/abs/2402.01721

Zobrazit plný text záznamu

Report

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

Autor: Cord-Landwehr, Tobias, Boeddeker, Christoph, Zorilă, Cătălin, Doddipatla, Rama, Haeb-Umbach, Reinhold

We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlapping speech. To this end, a geodesic distance loss is used th

Externí odkaz: http://arxiv.org/abs/2401.03963

Zobrazit plný text záznamu

Report

On Feature Importance and Interpretability of Speaker Representations

Autor: Rautenberg, Frederik, Kuhlmann, Michael, Wiechmann, Jana, Seebauer, Fritz, Wagner, Petra, Haeb-Umbach, Reinhold

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly nam

Externí odkaz: http://arxiv.org/abs/2310.12599

Zobrazit plný text záznamu

Report

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

Autor: von Neumann, Thilo, Boeddeker, Christoph, Cord-Landwehr, Tobias, Delcroix, Marc, Haeb-Umbach, Reinhold

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet separation archite

Externí odkaz: http://arxiv.org/abs/2309.16482

Zobrazit plný text záznamu

Report

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Autor: Vieting, Peter, Berger, Simon, von Neumann, Thilo, Boeddeker, Christoph, Schlüter, Ralf, Haeb-Umbach, Reinhold

Many real-life applications of automatic speech recognition (ASR) require processing of overlapped speech. A commonmethod involves first separating the speech into overlap-free streams and then performing ASR on the resulting signals. Recently, the i

Externí odkaz: http://arxiv.org/abs/2309.08454

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání