Výsledky vyhledávání

Report

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Autor: Liu, Haofeng, Zhang, Erli, Wu, Junde, Hong, Mingxuan, Jin, Yueming

Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video s

Externí odkaz: http://arxiv.org/abs/2408.07931

Zobrazit plný text záznamu

Report

Towards Open-ended Visual Quality Comparison

Autor: Wu, Haoning, Zhu, Hanwei, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Li, Chunyi, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Liu, Xiaohong, Zhai, Guangtao, Wang, Shiqi, Lin, Weisi

Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more

Externí odkaz: http://arxiv.org/abs/2402.16641

Zobrazit plný text záznamu

Report

Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs

Autor: Zhang, Zicheng, Wu, Haoning, Zhang, Erli, Zhai, Guangtao, Lin, Weisi

The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains

Externí odkaz: http://arxiv.org/abs/2402.07116

Zobrazit plný text záznamu

Report

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Autor: Wu, Haoning, Zhang, Zicheng, Zhang, Weixia, Chen, Chaofeng, Liao, Liang, Li, Chunyi, Gao, Yixuan, Wang, Annan, Zhang, Erli, Sun, Wenxiu, Yan, Qiong, Min, Xiongkuo, Zhai, Guangtao, Lin, Weisi

The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of la

Externí odkaz: http://arxiv.org/abs/2312.17090

Zobrazit plný text záznamu

Report

Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models

Autor: Zhang, Zicheng, Wu, Haoning, Ji, Zhongpeng, Li, Chunyi, Zhang, Erli, Sun, Wei, Liu, Xiaohong, Min, Xiongkuo, Sun, Fengyu, Jui, Shangling, Lin, Weisi, Zhai, Guangtao

Recent advancements in Multi-modality Large Language Models (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks. However, the exploration of MLLM potential in visual quality assessment, a vital aspect of low-level vis

Externí odkaz: http://arxiv.org/abs/2312.15300

Zobrazit plný text záznamu

Report

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Autor: Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Xu, Kaixin, Li, Chunyi, Hou, Jingwen, Zhai, Guangtao, Xue, Geng, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation mod

Externí odkaz: http://arxiv.org/abs/2311.06783

Zobrazit plný text záznamu

Report

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Autor: Wu, Haoning, Zhang, Zicheng, Zhang, Erli, Chen, Chaofeng, Liao, Liang, Wang, Annan, Li, Chunyi, Sun, Wenxiu, Yan, Qiong, Zhai, Guangtao, Lin, Weisi

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs

Externí odkaz: http://arxiv.org/abs/2309.14181

Zobrazit plný text záznamu

Report

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

Autor: Wu, Haoning, Zhang, Erli, Liao, Liang, Chen, Chaofeng, Hou, Jingwen, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affec

Externí odkaz: http://arxiv.org/abs/2305.12726

Zobrazit plný text záznamu

Report

Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

Autor: Wu, Haoning, Liao, Liang, Hou, Jingwen, Chen, Chaofeng, Zhang, Erli, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

Recent learning-based video quality assessment (VQA) algorithms are expensive to implement due to the cost of data collection of human quality opinions, and are less robust across various scenarios due to the biases of these opinions. This motivates

Externí odkaz: http://arxiv.org/abs/2302.13269

Zobrazit plný text záznamu

Report

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Autor: Wu, Haoning, Zhang, Erli, Liao, Liang, Chen, Chaofeng, Hou, Jingwen, Wang, Annan, Sun, Wenxiu, Yan, Qiong, Lin, Weisi

The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the

Externí odkaz: http://arxiv.org/abs/2211.04894

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání