Zobrazeno 1 - 10
of 134 562
pro vyhledávání: '"Dinesh, A."'
Autor:
Jena, Sushovan, Pulkit, Arya, Singh, Kajal, Banerjee, Anoushka, Joshi, Sharad, Ganesh, Ananth, Singh, Dinesh, Bhavsar, Arnav
With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection d
Externí odkaz:
http://arxiv.org/abs/2407.02968
Autor:
Selvaraj, Dinesh Cyril, Vitale, Christian, Panayiotou, Tania, Kolios, Panayiotis, Chiasserini, Carla Fabiana, Ellinas, Georgios
In pursuit of autonomous vehicles, achieving human-like driving behavior is vital. This study introduces adaptive autopilot (AA), a unique framework utilizing constrained-deep reinforcement learning (C-DRL). AA aims to safely emulate human driving to
Externí odkaz:
http://arxiv.org/abs/2407.02546
Autor:
Chowdhury, Sanjoy, Nag, Sayan, Dasgupta, Subhrajyoti, Chen, Jun, Elhoseiny, Mohamed, Gao, Ruohan, Manocha, Dinesh
Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks t
Externí odkaz:
http://arxiv.org/abs/2407.01851
Autor:
Mukherjee, Anirban, Bitra, Venkat Suprabath, Bondugula, Vignesh, Tallapureddy, Tarun Reddy, Jayagopi, Dinesh Babu
Designing and manipulating virtual human heads is essential across various applications, including AR, VR, gaming, human-computer interaction and VFX. Traditional graphic-based approaches require manual effort and resources to achieve accurate repres
Externí odkaz:
http://arxiv.org/abs/2407.00229
Autor:
Abdelaziz, Ibrahim, Basu, Kinjal, Agarwal, Mayank, Kumaravel, Sadhana, Stallone, Matthew, Panda, Rameswar, Rizk, Yara, Bhargav, GP, Crouse, Maxwell, Gunasekara, Chulaka, Ikbal, Shajith, Joshi, Sachin, Karanam, Hima, Kumar, Vineet, Munawar, Asim, Neelam, Sumit, Raghu, Dinesh, Sharma, Udit, Soria, Adriana Meza, Sreedhar, Dheeraj, Venkateswaran, Praveen, Unuvar, Merve, Cox, David, Roukos, Salim, Lastras, Luis, Kapanipathi, Pavan
Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the t
Externí odkaz:
http://arxiv.org/abs/2407.00121
Publikováno v:
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 1st Workshop on Human Motion Generation, 2024, Seattle, Washington, USA
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters using RGB video data captured using commodity cameras. Our approach learns from sparse face landmar
Externí odkaz:
http://arxiv.org/abs/2406.18068
Image-text contrastive models such as CLIP learn transferable and robust representations for zero-shot transfer to a variety of downstream tasks. However, to obtain strong downstream performances, prompts need to be carefully curated, which can be a
Externí odkaz:
http://arxiv.org/abs/2406.13683
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Autor:
Ghosh, Sreyan, Kumar, Sonal, Seth, Ashish, Evuru, Chandra Kiran Reddy, Tyagi, Utkarsh, Sakshi, S, Nieto, Oriol, Duraiswami, Ramani, Manocha, Dinesh
Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced
Externí odkaz:
http://arxiv.org/abs/2406.11768
Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time
Externí odkaz:
http://arxiv.org/abs/2406.10918
Autor:
Wu, Xiyang, Guan, Tianrui, Li, Dianqi, Huang, Shuaiyi, Liu, Xiaoyu, Wang, Xijun, Xian, Ruiqi, Shrivastava, Abhinav, Huang, Furong, Boyd-Graber, Jordan Lee, Zhou, Tianyi, Manocha, Dinesh
Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate
Externí odkaz:
http://arxiv.org/abs/2406.10900