Zobrazeno 1 - 10
of 34
pro vyhledávání: '"TYAGI, UTKARSH"'
Autor:
Sakshi, S, Tyagi, Utkarsh, Kumar, Sonal, Seth, Ashish, Selvakumar, Ramaneswaran, Nieto, Oriol, Duraiswami, Ramani, Ghosh, Sreyan, Manocha, Dinesh
The ability to comprehend audio--which includes speech, non-speech sounds, and music--is crucial for AI agents to interact effectively with the world. We present MMAU, a novel benchmark designed to evaluate multimodal audio understanding models on ta
Externí odkaz:
http://arxiv.org/abs/2410.19168
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Autor:
Ghosh, Sreyan, Kumar, Sonal, Seth, Ashish, Evuru, Chandra Kiran Reddy, Tyagi, Utkarsh, Sakshi, S, Nieto, Oriol, Duraiswami, Ramani, Manocha, Dinesh
Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced
Externí odkaz:
http://arxiv.org/abs/2406.11768
Autor:
Ghosh, Sreyan, Kumar, Sonal, Seth, Ashish, Chiniya, Purva, Tyagi, Utkarsh, Duraiswami, Ramani, Manocha, Dinesh
Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cu
Externí odkaz:
http://arxiv.org/abs/2406.04432
Autor:
Ghosh, Sreyan, Tyagi, Utkarsh, Kumar, Sonal, Evuru, C. K., Ramaneswaran, S, Sakshi, S, Manocha, Dinesh
We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- w
Externí odkaz:
http://arxiv.org/abs/2406.04286
Autor:
Ghosh, Sreyan, Evuru, Chandra Kiran Reddy, Kumar, Sonal, Tyagi, Utkarsh, Nieto, Oriol, Jin, Zeyu, Manocha, Dinesh
Large Vision-Language Models (LVLMs) often produce responses that misalign with factual information, a phenomenon known as hallucinations. While hallucinations are well-studied, the exact causes behind them remain underexplored. In this paper, we fir
Externí odkaz:
http://arxiv.org/abs/2405.15683
Open-vocabulary vision-language models (VLMs) like CLIP, trained using contrastive loss, have emerged as a promising new paradigm for text-to-image retrieval. However, do VLMs understand compound nouns (CNs) (e.g., lab coat) as well as they understan
Externí odkaz:
http://arxiv.org/abs/2404.00419
Autor:
Evuru, Chandra Kiran Reddy, Ghosh, Sreyan, Kumar, Sonal, S, Ramaneswaran, Tyagi, Utkarsh, Manocha, Dinesh
We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Larg
Externí odkaz:
http://arxiv.org/abs/2404.00415
Autor:
Ghosh, Sreyan, Evuru, Chandra Kiran, Kumar, Sonal, Ramaneswaran, S, Sakshi, S, Tyagi, Utkarsh, Manocha, Dinesh
We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its
Externí odkaz:
http://arxiv.org/abs/2310.15799
Autor:
Ghosh, Sreyan, Seth, Ashish, Kumar, Sonal, Tyagi, Utkarsh, Evuru, Chandra Kiran, Ramaneswaran, S., Sakshi, S., Nieto, Oriol, Duraiswami, Ramani, Manocha, Dinesh
A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in m
Externí odkaz:
http://arxiv.org/abs/2310.08753
Autor:
Chowdhury, Sanjoy, Ghosh, Sreyan, Dasgupta, Subhrajyoti, Ratnarajah, Anton, Tyagi, Utkarsh, Manocha, Dinesh
We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, our approach incorporates the complem
Externí odkaz:
http://arxiv.org/abs/2308.12370