Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Ataallah, Kirolos"'
Autor:
Ataallah, Kirolos, Shen, Xiaoqian, Abdelrahman, Eslam, Sleiman, Essam, Zhuge, Mingchen, Ding, Jian, Zhu, Deyao, Schmidhuber, Jürgen, Elhoseiny, Mohamed
Most current LLM-based models for video understanding can process videos within minutes. However, they struggle with lengthy videos due to challenges such as "noise and redundancy", as well as "memory and computation" constraints. In this paper, we p
Externí odkaz:
http://arxiv.org/abs/2407.12679
Autor:
Ataallah, Kirolos, Gou, Chenhui, Abdelrahman, Eslam, Pahwa, Khushbu, Ding, Jian, Elhoseiny, Mohamed
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To addres
Externí odkaz:
http://arxiv.org/abs/2406.19875
Autor:
Ataallah, Kirolos, Shen, Xiaoqian, Abdelrahman, Eslam, Sleiman, Essam, Zhu, Deyao, Ding, Jian, Elhoseiny, Mohamed
This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding. The model is capable of processing both temporal visual and textual data, making it adept at understanding the complexities o
Externí odkaz:
http://arxiv.org/abs/2404.03413