Výsledky vyhledávání - "Shah, Rajiv Ratn"

Report

JOOCI: a Framework for Learning Comprehensive Speech Representations

Autor: Yadav, Hemant, Shah, Rajiv Ratn, Sitaram, Sunayana

Information in speech can be divided into two categories: what is being said (content) and how it is expressed (other). Current state-of-the-art (SOTA) techniques model speech at fixed segments, usually 10-25 ms, using a single embedding. Given the o

Externí odkaz: http://arxiv.org/abs/2410.11086

Zobrazit plný text záznamu

Report

RConE: Rough Cone Embedding for Multi-Hop Logical Query Answering on Multi-Modal Knowledge Graphs

Autor: Kharbanda, Mayank, Shah, Rajiv Ratn, Mutharaju, Raghava

Multi-hop query answering over a Knowledge Graph (KG) involves traversing one or more hops from the start node to answer a query. Path-based and logic-based methods are state-of-the-art for multi-hop question answering. The former is used in link pre

Externí odkaz: http://arxiv.org/abs/2408.11526

Zobrazit plný text záznamu

Report

Multilingual Non-Factoid Question Answering with Silver Answers

Autor: Mishra, Ritwik, Vennam, Sreeram, Shah, Rajiv Ratn, Kumaraguru, Ponnurangam

Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works

Externí odkaz: http://arxiv.org/abs/2408.10604

Zobrazit plný text záznamu

Report

Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

Autor: Yadav, Hemant, Sitaram, Sunayana, Shah, Rajiv Ratn

Speech modeling methods learn one embedding for a fixed segment of speech, typically in between 10-25 ms. The information present in speech can be divided into two categories: "what is being said" (content) and "how it is expressed" (other) and these

Externí odkaz: http://arxiv.org/abs/2408.10557

Zobrazit plný text záznamu

Report

Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities

Autor: Anand, Avinash, Tank, Chayan, Pol, Sarthak, Katoch, Vinayak, Mehta, Shaina, Shah, Rajiv Ratn

Depression has proven to be a significant public health issue, profoundly affecting the psychological well-being of individuals. If it remains undiagnosed, depression can lead to severe health issues, which can manifest physically and even lead to su

Externí odkaz: http://arxiv.org/abs/2407.06125

Zobrazit plný text záznamu

Report

Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Autor: Kundu, Debnath, Mehta, Atharva, Kumar, Rajesh, Lal, Naman, Anand, Avinash, Singh, Apoorv, Shah, Rajiv Ratn

The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize adva

Externí odkaz: http://arxiv.org/abs/2406.15335

Zobrazit plný text záznamu

Report

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

Autor: Sahipjohn, Neha, Gudmalwar, Ashishkumar, Shah, Nirmesh, Wasnik, Pankaj, Shah, Rajiv Ratn

Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in

Externí odkaz: http://arxiv.org/abs/2406.08802

Zobrazit plný text záznamu

Report

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Autor: Gudmalwar, Ashishkumar, Shah, Nirmesh, Akarsh, Sai, Wasnik, Pankaj, Shah, Rajiv Ratn

Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source languag

Externí odkaz: http://arxiv.org/abs/2406.08076

Zobrazit plný text záznamu

Report

MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

Autor: Yadav, Hemant, Sitaram, Sunayana, Shah, Rajiv Ratn

In recent years, self-supervised pre-training methods have gained significant traction in learning high-level information from raw speech. Among these methods, HuBERT has demonstrated SOTA performance in automatic speech recognition (ASR). However, H

Externí odkaz: http://arxiv.org/abs/2406.05661

Zobrazit plný text záznamu

Report

Teaching Human Behavior Improves Content Understanding Abilities Of LLMs

Autor: Singh, Somesh, S I, Harini, Singla, Yaman K, Baths, Veeky, Shah, Rajiv Ratn, Chen, Changyou, Krishnamurthy, Balaji

Communication is defined as "Who says what to whom with what effect". A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about

Externí odkaz: http://arxiv.org/abs/2405.00942

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání