Výsledky vyhledávání

Report

Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization

Autor: Jena, Sushovan, Pulkit, Arya, Singh, Kajal, Banerjee, Anoushka, Joshi, Sharad, Ganesh, Ananth, Singh, Dinesh, Bhavsar, Arnav

With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection d

Externí odkaz: http://arxiv.org/abs/2407.02968

Zobrazit plný text záznamu

Report

Adaptive Autopilot: Constrained DRL for Diverse Driving Behaviors

Autor: Selvaraj, Dinesh Cyril, Vitale, Christian, Panayiotou, Tania, Kolios, Panayiotis, Chiasserini, Carla Fabiana, Ellinas, Georgios

In pursuit of autonomous vehicles, achieving human-like driving behavior is vital. This study introduces adaptive autopilot (AA), a unique framework utilizing constrained-deep reinforcement learning (C-DRL). AA aims to safely emulate human driving to

Externí odkaz: http://arxiv.org/abs/2407.02546

Zobrazit plný text záznamu

Report

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Autor: Chowdhury, Sanjoy, Nag, Sayan, Dasgupta, Subhrajyoti, Chen, Jun, Elhoseiny, Mohamed, Gao, Ruohan, Manocha, Dinesh

Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks t

Externí odkaz: http://arxiv.org/abs/2407.01851

Zobrazit plný text záznamu

Report

SemUV: Deep Learning based semantic manipulation over UV texture map of virtual human heads

Autor: Mukherjee, Anirban, Bitra, Venkat Suprabath, Bondugula, Vignesh, Tallapureddy, Tarun Reddy, Jayagopi, Dinesh Babu

Designing and manipulating virtual human heads is essential across various applications, including AR, VR, gaming, human-computer interaction and VFX. Traditional graphic-based approaches require manual effort and resources to achieve accurate repres

Externí odkaz: http://arxiv.org/abs/2407.00229

Zobrazit plný text záznamu

Report

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the t

Externí odkaz: http://arxiv.org/abs/2407.00121

Zobrazit plný text záznamu

Report

Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs

Autor: Bhattacharya, Uttaran, Bera, Aniket, Manocha, Dinesh

Publikováno v: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1st Workshop on Human Motion Generation, 2024, Seattle, Washington, USA

We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters using RGB video data captured using commodity cameras. Our approach learns from sparse face landmar

Externí odkaz: http://arxiv.org/abs/2406.18068

Zobrazit plný text záznamu

Report

IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

Autor: Ghosal, Soumya Suvra, Basu, Samyadeep, Feizi, Soheil, Manocha, Dinesh

Image-text contrastive models such as CLIP learn transferable and robust representations for zero-shot transfer to a variety of downstream tasks. However, to obtain strong downstream performances, prompts need to be carefully curated, which can be a

Externí odkaz: http://arxiv.org/abs/2406.13683

Zobrazit plný text záznamu

Report

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Autor: Ghosh, Sreyan, Kumar, Sonal, Seth, Ashish, Evuru, Chandra Kiran Reddy, Tyagi, Utkarsh, Sakshi, S, Nieto, Oriol, Duraiswami, Ramani, Manocha, Dinesh

Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced

Externí odkaz: http://arxiv.org/abs/2406.11768

Zobrazit plný text záznamu

Report

Embodied Question Answering via Multi-LLM Systems

Autor: Patel, Bhrij, Dorbala, Vishnu Sashank, Manocha, Dinesh, Bedi, Amrit Singh

Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time

Externí odkaz: http://arxiv.org/abs/2406.10918

Zobrazit plný text záznamu

Report

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Autor: Wu, Xiyang, Guan, Tianrui, Li, Dianqi, Huang, Shuaiyi, Liu, Xiaoyu, Wang, Xijun, Xian, Ruiqi, Shrivastava, Abhinav, Huang, Furong, Boyd-Graber, Jordan Lee, Zhou, Tianyi, Manocha, Dinesh

Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate

Externí odkaz: http://arxiv.org/abs/2406.10900

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání