Zobrazeno 1 - 10
of 188
pro vyhledávání: '"Shah, Rajiv Ratn"'
Information in speech can be divided into two categories: what is being said (content) and how it is expressed (other). Current state-of-the-art (SOTA) techniques model speech at fixed segments, usually 10-25 ms, using a single embedding. Given the o
Externí odkaz:
http://arxiv.org/abs/2410.11086
Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link pre
Externí odkaz:
http://arxiv.org/abs/2408.11526
Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works
Externí odkaz:
http://arxiv.org/abs/2408.10604
Speech modeling methods learn one embedding for a fixed segment of speech, typically in between 10-25 ms. The information present in speech can be divided into two categories: "what is being said" (content) and "how it is expressed" (other) and these
Externí odkaz:
http://arxiv.org/abs/2408.10557
Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities
Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to su
Externí odkaz:
http://arxiv.org/abs/2407.06125
Autor:
Kundu, Debnath, Mehta, Atharva, Kumar, Rajesh, Lal, Naman, Anand, Avinash, Singh, Apoorv, Shah, Rajiv Ratn
The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize adva
Externí odkaz:
http://arxiv.org/abs/2406.15335
Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in
Externí odkaz:
http://arxiv.org/abs/2406.08802
Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source languag
Externí odkaz:
http://arxiv.org/abs/2406.08076
In recent years, self-supervised pre-training methods have gained significant traction in learning high-level information from raw speech. Among these methods, HuBERT has demonstrated SOTA performance in automatic speech recognition (ASR). However, H
Externí odkaz:
http://arxiv.org/abs/2406.05661
Autor:
Singh, Somesh, S I, Harini, Singla, Yaman K, Baths, Veeky, Shah, Rajiv Ratn, Chen, Changyou, Krishnamurthy, Balaji
Communication is defined as "Who says what to whom with what effect". A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about
Externí odkaz:
http://arxiv.org/abs/2405.00942