Zobrazeno 1 - 10
of 16 359
pro vyhledávání: '"Essa, A."'
Foundational models are able to generate text outputs given prompt instructions and text, audio, or image inputs. Recently these models have been combined to perform tasks on video, such as video summarization. Such video foundation models perform pr
Externí odkaz:
http://arxiv.org/abs/2410.07405
Recent breakthroughs in Large Language Models (LLMs) have led to their adoption across a wide range of tasks, ranging from code generation to machine translation and sentiment analysis, etc. Red teaming/Safety alignment efforts show that fine-tuning
Externí odkaz:
http://arxiv.org/abs/2409.15361
Video Language Models (VLMs) are crucial for generalizing across diverse tasks and using language cues to enhance learning. While transformer-based architectures have been the de facto in vision-language training, they face challenges like quadratic
Externí odkaz:
http://arxiv.org/abs/2409.11513
Autor:
Haresamudram, Harish, Beedu, Apoorva, Rabbi, Mashfiqui, Saha, Sankalita, Essa, Irfan, Ploetz, Thomas
Cross-modal contrastive pre-training between natural language and other modalities, e.g., vision and audio, has demonstrated astonishing performance and effectiveness across a diverse variety of tasks and domains. In this paper, we investigate whethe
Externí odkaz:
http://arxiv.org/abs/2408.12023
Autor:
Lee, Seung Hyun, Ke, Junjie, Li, Yinxiao, He, Junfeng, Hickson, Steven, Datsenko, Katie, Kim, Sangpil, Yang, Ming-Hsuan, Essa, Irfan, Yang, Feng
The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large v
Externí odkaz:
http://arxiv.org/abs/2408.07790
Autor:
Alharbi, Abdullah S., Albishi, Miqad S., Maksudov, Temur, Alhuwaymel, Tariq F., Aivalioti, Chrysa, AlShebl, Kadi S., Alshamrani, Naif R., Isikgor, Furkan H., Aldosari, Mubarak, Aljomah, Majed M., Petridis, Konstantinos, Anthopoulos, Thomas D., Kakavelakis, George, Alharbi, Essa A.
Ion and metal diffusion in metal halide perovskites, charge-transporting layers, and electrodes are detrimental to the performance and stability of perovskite-based photovoltaic devices. As a result, there is an intense research interest in developin
Externí odkaz:
http://arxiv.org/abs/2407.21662
We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output o
Externí odkaz:
http://arxiv.org/abs/2405.13195
Autor:
Al-Essa, Lulwah, Majdoub, Mohamed
We investigate the lifespan of solutions to a specific variant of the semilinear wave equation, which incorporates weighted nonlinearity $$ u_{tt}-u_{xx} =|x|^\alpha |u|^p, \quad\mbox{for}\;\;\; (t,x)\in (0,\infty)\times\mathbb{R}, $$ where $p>1$, $\
Externí odkaz:
http://arxiv.org/abs/2404.16173
We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM syst
Externí odkaz:
http://arxiv.org/abs/2404.11419
We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-ide
Externí odkaz:
http://arxiv.org/abs/2403.13190