Výsledky vyhledávání - "Bakr, Eslam Mohamed"

Report

iMotion-LLM: Motion Prediction Instruction Tuning

Autor: Felemban, Abdulwahab, Bakr, Eslam Mohamed, Shen, Xiaoqian, Ding, Jian, Mohamed, Abduallah, Elhoseiny, Mohamed

We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instruct

Externí odkaz: http://arxiv.org/abs/2406.06211

Zobrazit plný text záznamu

Report

Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding

Autor: Fei, Junjie, Ahmed, Mahmoud, Ding, Jian, Bakr, Eslam Mohamed, Elhoseiny, Mohamed

While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empow

Externí odkaz: http://arxiv.org/abs/2405.18937

Zobrazit plný text záznamu

Report

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Autor: Bakr, Eslam Mohamed, Sun, Pengzhan, Shen, Xiaoqian, Khan, Faizan Farooq, Li, Li Erran, Elhoseiny, Mohamed

In recent years, Text-to-Image (T2I) models have been extensively studied, especially with the emergence of diffusion models that achieve state-of-the-art results on T2I synthesis tasks. However, existing benchmarks heavily rely on subjective human e

Externí odkaz: http://arxiv.org/abs/2304.05390

Zobrazit plný text záznamu

Report

ImageCaptioner$^2$: Image Captioner for Image Captioning Bias Amplification Assessment

Autor: Bakr, Eslam Mohamed, Sun, Pengzhan, Li, Li Erran, Elhoseiny, Mohamed

Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image captioning. Des

Externí odkaz: http://arxiv.org/abs/2304.04874

Zobrazit plný text záznamu

Report

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

Autor: Bakr, Eslam Mohamed, Alsaedy, Yasmeen, Elhoseiny, Mohamed

Publikováno v: NeurIPS 2022

The 3D visual grounding task has been explored with visual and language streams comprehending referential language to identify target objects in 3D scenes. However, most existing methods devote the visual stream to capturing the 3D visual clues using

Externí odkaz: http://arxiv.org/abs/2211.14241

Zobrazit plný text záznamu

Report

PKCAM: Previous Knowledge Channel Attention Module

Autor: Bakr, Eslam Mohamed, Sallab, Ahmad El, Rashwan, Mohsen A.

Recently, attention mechanisms have been explored with ConvNets, both across the spatial and channel dimensions. However, from our knowledge, all the existing methods devote the attention modules to capture local interactions from a uni-scale. In thi

Externí odkaz: http://arxiv.org/abs/2211.07521

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání