Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Gou, Chenhui"'
Zero-shot subject-driven image generation aims to produce images that incorporate a subject from a given example image. The challenge lies in preserving the subject's identity while aligning with the text prompt which often requires modifying certain
Externí odkaz:
http://arxiv.org/abs/2409.08091
Autor:
Gou, Chenhui, Felemban, Abdulwahab, Khan, Faizan Farooq, Zhu, Deyao, Cai, Jianfei, Rezatofighi, Hamid, Elhoseiny, Mohamed
Large Language Model-based Vision-Language Models (LLM-based VLMs) have demonstrated impressive results in various vision-language understanding tasks. However, how well these VLMs can see image detail beyond the semantic level remains unclear. In ou
Externí odkaz:
http://arxiv.org/abs/2408.03940
Autor:
Ataallah, Kirolos, Gou, Chenhui, Abdelrahman, Eslam, Pahwa, Khushbu, Ding, Jian, Elhoseiny, Mohamed
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To addres
Externí odkaz:
http://arxiv.org/abs/2406.19875
Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: dif
Externí odkaz:
http://arxiv.org/abs/2406.12846
Autor:
Le, Duy-Tho, Gou, Chenhui, Datta, Stavya, Shi, Hengcan, Reid, Ian, Cai, Jianfei, Rezatofighi, Hamid
Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data
Externí odkaz:
http://arxiv.org/abs/2404.01686
Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degrada
Externí odkaz:
http://arxiv.org/abs/2403.10520
Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transform
Externí odkaz:
http://arxiv.org/abs/2210.07124