Výsledky vyhledávání - "PRASANNA, S."

Report

Biometrics in Extended Reality: A Review

Autor: Agarwal, Ayush, Ramachandra, Raghavendra, Venkatesh, Sushma, Prasanna, S. R. Mahadeva

In the domain of Extended Reality (XR), particularly Virtual Reality (VR), extensive research has been devoted to harnessing this transformative technology in various real-world applications. However, a critical challenge that must be addressed befor

Externí odkaz: http://arxiv.org/abs/2411.10489

Zobrazit plný text záznamu

Report

Analyzing long-term rhythm variations in Mising and Assamese using frequency domain correlates

Autor: Gogoi, Parismita, Sarmah, Priyankoo, Prasanna, S. R. M.

The current work explores long-term speech rhythm variations to classify Mising and Assamese, two low-resourced languages from Assam, Northeast India. We study the temporal information of speech rhythm embedded in low-frequency (LF) spectrograms deri

Externí odkaz: http://arxiv.org/abs/2410.20095

Zobrazit plný text záznamu

Report

Exploring rhythm formant analysis for Indic language classification

Autor: Gogoi, Parismita, Kalita, Sishir, Sarmah, Priyankoo, Prasanna, S. R Mahadeva

This paper reports a preliminary study on quantitative frequency domain rhythm cues for classifying five Indian languages: Bengali, Kannada, Malayalam, Marathi, and Tamil. We employ rhythm formant (R-formants) analysis, a technique introduced by Gibb

Externí odkaz: http://arxiv.org/abs/2410.05724

Zobrazit plný text záznamu

Report

Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection

Autor: Phukan, Orchid Chetia, Girish, Akhtar, Mohd Mujtaba, Behera, Swarup Ranjan, Choudhury, Nitin, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva

The adaptation of foundation models has significantly advanced environmental audio deepfake detection (EADD), a rapidly growing area of research. These models are typically fine-tuned or utilized in their frozen states for downstream tasks. However,

Externí odkaz: http://arxiv.org/abs/2409.15767

Zobrazit plný text záznamu

Report

Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

Autor: Phukan, Orchid Chetia, Behera, Swarup Ranjan, Singh, Shubham, Singh, Muskaan, Rajan, Vandana, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R. Mahadeva

In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-ex

Externí odkaz: http://arxiv.org/abs/2409.14312

Zobrazit plný text záznamu

Report

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition

Autor: Phukan, Orchid Chetia, Akhtar, Mohd Mujtaba, Girish, Behera, Swarup Ranjan, Kalita, Sishir, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva

In this study, we investigate multimodal foundation models (MFMs) for emotion recognition from non-verbal sounds. We hypothesize that MFMs, with their joint pre-training across multiple modalities, will be more effective in non-verbal sounds emotion

Externí odkaz: http://arxiv.org/abs/2409.14221

Zobrazit plný text záznamu

Report

Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models

Autor: Phukan, Orchid Chetia, Jain, Sarthak, Behera, Swarup Ranjan, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva

In this study, for the first time, we extensively investigate whether music foundation models (MFMs) or speech foundation models (SFMs) work better for singing voice deepfake detection (SVDD), which has recently attracted attention in the research co

Externí odkaz: http://arxiv.org/abs/2409.14131

Zobrazit plný text záznamu

Report

The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments

Autor: Kalluri, Shareef Babu, Singh, Prachi, Chowdhuri, Pratik Roy, Kulkarni, Apoorva, Baghel, Shikha, Hegde, Pradyoth, Sontakke, Swapnil, T, Deepak K, Prasanna, S. R. Mahadeva, Vijayasenan, Deepu, Ganapathy, Sriram

The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multi

Externí odkaz: http://arxiv.org/abs/2406.09494

Zobrazit plný text záznamu

Kniha

Speech and Computer : 24th International Conference, SPECOM 2022, Gurugram, India, November 14-16, 2022, Proceedings. [elektronicky zdroj]

Autor: Prasanna, S. R. Mahadeva

Externí odkaz: Kolekce e-knih KNAV (Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.)

Report

Implicit Self-supervised Language Representation for Spoken Language Diarization

Autor: Mishra, Jagabandhu, Prasanna, S. R. Mahadeva

Publikováno v: IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024

In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zer

Externí odkaz: http://arxiv.org/abs/2308.10470

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání