Zobrazeno 1 - 10
of 5 485
pro vyhledávání: '"PRASANNA, S."'
In the domain of Extended Reality (XR), particularly Virtual Reality (VR), extensive research has been devoted to harnessing this transformative technology in various real-world applications. However, a critical challenge that must be addressed befor
Externí odkaz:
http://arxiv.org/abs/2411.10489
The current work explores long-term speech rhythm variations to classify Mising and Assamese, two low-resourced languages from Assam, Northeast India. We study the temporal information of speech rhythm embedded in low-frequency (LF) spectrograms deri
Externí odkaz:
http://arxiv.org/abs/2410.20095
This paper reports a preliminary study on quantitative frequency domain rhythm cues for classifying five Indian languages: Bengali, Kannada, Malayalam, Marathi, and Tamil. We employ rhythm formant (R-formants) analysis, a technique introduced by Gibb
Externí odkaz:
http://arxiv.org/abs/2410.05724
Autor:
Phukan, Orchid Chetia, Girish, Akhtar, Mohd Mujtaba, Behera, Swarup Ranjan, Choudhury, Nitin, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva
The adaptation of foundation models has significantly advanced environmental audio deepfake detection (EADD), a rapidly growing area of research. These models are typically fine-tuned or utilized in their frozen states for downstream tasks. However,
Externí odkaz:
http://arxiv.org/abs/2409.15767
Autor:
Phukan, Orchid Chetia, Behera, Swarup Ranjan, Singh, Shubham, Singh, Muskaan, Rajan, Vandana, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R. Mahadeva
In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-ex
Externí odkaz:
http://arxiv.org/abs/2409.14312
Autor:
Phukan, Orchid Chetia, Akhtar, Mohd Mujtaba, Girish, Behera, Swarup Ranjan, Kalita, Sishir, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva
In this study, we investigate multimodal foundation models (MFMs) for emotion recognition from non-verbal sounds. We hypothesize that MFMs, with their joint pre-training across multiple modalities, will be more effective in non-verbal sounds emotion
Externí odkaz:
http://arxiv.org/abs/2409.14221
Autor:
Phukan, Orchid Chetia, Jain, Sarthak, Behera, Swarup Ranjan, Buduru, Arun Balaji, Sharma, Rajesh, Prasanna, S. R Mahadeva
In this study, for the first time, we extensively investigate whether music foundation models (MFMs) or speech foundation models (SFMs) work better for singing voice deepfake detection (SVDD), which has recently attracted attention in the research co
Externí odkaz:
http://arxiv.org/abs/2409.14131
Autor:
Kalluri, Shareef Babu, Singh, Prachi, Chowdhuri, Pratik Roy, Kulkarni, Apoorva, Baghel, Shikha, Hegde, Pradyoth, Sontakke, Swapnil, T, Deepak K, Prasanna, S. R. Mahadeva, Vijayasenan, Deepu, Ganapathy, Sriram
The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multi
Externí odkaz:
http://arxiv.org/abs/2406.09494
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024
In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zer
Externí odkaz:
http://arxiv.org/abs/2308.10470