Výsledky vyhledávání - "Gou, Chenhui"

Report

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance

Autor: Duan, Zicheng, Ding, Yuxuan, Gou, Chenhui, Zhou, Ziqin, Smith, Ethan, Liu, Lingqiao

Zero-shot subject-driven image generation aims to produce images that incorporate a subject from a given example image. The challenge lies in preserving the subject's identity while aligning with the text prompt which often requires modifying certain

Externí odkaz: http://arxiv.org/abs/2409.08091

Zobrazit plný text záznamu

Report

How Well Can Vision Language Models See Image Details?

Autor: Gou, Chenhui, Felemban, Abdulwahab, Khan, Faizan Farooq, Zhu, Deyao, Cai, Jianfei, Rezatofighi, Hamid, Elhoseiny, Mohamed

Large Language Model-based Vision-Language Models (LLM-based VLMs) have demonstrated impressive results in various vision-language understanding tasks. However, how well these VLMs can see image detail beyond the semantic level remains unclear. In ou

Externí odkaz: http://arxiv.org/abs/2408.03940

Zobrazit plný text záznamu

Report

InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

Autor: Ataallah, Kirolos, Gou, Chenhui, Abdelrahman, Eslam, Pahwa, Khushbu, Ding, Jian, Elhoseiny, Mohamed

Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To addres

Externí odkaz: http://arxiv.org/abs/2406.19875

Zobrazit plný text záznamu

Report

DrVideo: Document Retrieval Based Long Video Understanding

Autor: Ma, Ziyu, Gou, Chenhui, Shi, Hengcan, Sun, Bin, Li, Shutao, Rezatofighi, Hamid, Cai, Jianfei

Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: dif

Externí odkaz: http://arxiv.org/abs/2406.12846

Zobrazit plný text záznamu

Report

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

Autor: Le, Duy-Tho, Gou, Chenhui, Datta, Stavya, Shi, Hengcan, Reid, Ian, Cai, Jianfei, Rezatofighi, Hamid

Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data

Externí odkaz: http://arxiv.org/abs/2404.01686

Zobrazit plný text záznamu

Report

Strong and Controllable Blind Image Decomposition

Autor: Zhang, Zeyu, Han, Junlin, Gou, Chenhui, Li, Hongdong, Zheng, Liang

Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degrada

Externí odkaz: http://arxiv.org/abs/2403.10520

Zobrazit plný text záznamu

Report

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Autor: Wang, Jian, Gou, Chenhui, Wu, Qiman, Feng, Haocheng, Han, Junyu, Ding, Errui, Wang, Jingdong

Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transform

Externí odkaz: http://arxiv.org/abs/2210.07124

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání