Výsledky vyhledávání

Report

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Autor: Samel, Karan, Beedu, Apoorva, Sontakke, Nitish, Essa, Irfan

Foundational models are able to generate text outputs given prompt instructions and text, audio, or image inputs. Recently these models have been combined to perform tasks on video, such as video summarization. Such video foundation models perform pr

Externí odkaz: http://arxiv.org/abs/2410.07405

Zobrazit plný text záznamu

Report

Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning

Autor: Jan, Essa, AlDahoul, Nouar, Ali, Moiz, Ahmad, Faizan, Zaffar, Fareed, Zaki, Yasir

Recent breakthroughs in Large Language Models (LLMs) have led to their adoption across a wide range of tasks, ranging from code generation to machine translation and sentiment analysis, etc. Red teaming/Safety alignment efforts show that fine-tuning

Externí odkaz: http://arxiv.org/abs/2409.15361

Zobrazit plný text záznamu

Report

Mamba Fusion: Learning Actions Through Questioning

Autor: Dong, Zhikang, Beedu, Apoorva, Sheinkopf, Jason, Essa, Irfan

Video Language Models (VLMs) are crucial for generalizing across diverse tasks and using language cues to enhance learning. While transformer-based architectures have been the de facto in vision-language training, they face challenges like quadratic

Externí odkaz: http://arxiv.org/abs/2409.11513

Zobrazit plný text záznamu

Report

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them

Autor: Haresamudram, Harish, Beedu, Apoorva, Rabbi, Mashfiqui, Saha, Sankalita, Essa, Irfan, Ploetz, Thomas

Cross-modal contrastive pre-training between natural language and other modalities, e.g., vision and audio, has demonstrated astonishing performance and effectiveness across a diverse variety of tasks and domains. In this paper, we investigate whethe

Externí odkaz: http://arxiv.org/abs/2408.12023

Zobrazit plný text záznamu

Report

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Autor: Lee, Seung Hyun, Ke, Junjie, Li, Yinxiao, He, Junfeng, Hickson, Steven, Datsenko, Katie, Kim, Sangpil, Yang, Ming-Hsuan, Essa, Irfan, Yang, Feng

The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large v

Externí odkaz: http://arxiv.org/abs/2408.07790

Zobrazit plný text záznamu

Report

Stable Perovskite Solar Cells via exfoliated graphite as an ion diffusion-blocking layer

Autor: Alharbi, Abdullah S., Albishi, Miqad S., Maksudov, Temur, Alhuwaymel, Tariq F., Aivalioti, Chrysa, AlShebl, Kadi S., Alshamrani, Naif R., Isikgor, Furkan H., Aldosari, Mubarak, Aljomah, Majed M., Petridis, Konstantinos, Anthopoulos, Thomas D., Kakavelakis, George, Alharbi, Essa A.

Ion and metal diffusion in metal halide perovskites, charge-transporting layers, and electrodes are detrimental to the performance and stability of perovskite-based photovoltaic devices. As a result, there is an intense research interest in developin

Externí odkaz: http://arxiv.org/abs/2407.21662

Zobrazit plný text záznamu

Report

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Autor: Marmon, Andrew, Schindler, Grant, Lezama, José, Kondratyuk, Dan, Seybold, Bryan, Essa, Irfan

We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output o

Externí odkaz: http://arxiv.org/abs/2405.13195

Zobrazit plný text záznamu

Report

The lifespan of solutions of semilinear wave equation with weighted nonlinearity

Autor: Al-Essa, Lulwah, Majdoub, Mohamed

We investigate the lifespan of solutions to a specific variant of the semilinear wave equation, which incorporates weighted nonlinearity $$ u_{tt}-u_{xx} =|x|^\alpha |u|^p, \quad\mbox{for}\;\;\; (t,x)\in (0,\infty)\times\mathbb{R}, $$ where $p>1$, $\

Externí odkaz: http://arxiv.org/abs/2404.16173

Zobrazit plný text záznamu

Report

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

Autor: Cartillier, Vincent, Schindler, Grant, Essa, Irfan

We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM syst

Externí odkaz: http://arxiv.org/abs/2404.11419

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání