Výsledky vyhledávání

Report

Agriculture-Vision Challenge 2024 -- The Runner-Up Solution for Agricultural Pattern Recognition via Class Balancing and Model Ensemble

Autor: Liu, Wang, Wang, Zhiyu, Duan, Puhong, Kang, Xudong, Li, Shutao

The Agriculture-Vision Challenge at CVPR 2024 aims at leveraging semantic segmentation models to produce pixel level semantic segmentation labels within regions of interest for multi-modality satellite images. It is one of the most famous and competi

Externí odkaz: http://arxiv.org/abs/2406.12271

Zobrazit plný text záznamu

Report

DrVideo: Document Retrieval Based Long Video Understanding

Autor: Ma, Ziyu, Gou, Chenhui, Shi, Hengcan, Sun, Bin, Li, Shutao, Rezatofighi, Hamid, Cai, Jianfei

Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: dif

Externí odkaz: http://arxiv.org/abs/2406.12846

Zobrazit plný text záznamu

Report

Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

Autor: Wu, Linshan, Zhong, Zhun, Ma, Jiayi, Wei, Yunchao, Chen, Hao, Fang, Leyuan, Li, Shutao

Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while larg

Externí odkaz: http://arxiv.org/abs/2403.13225

Zobrazit plný text záznamu

Report

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

Autor: Ma, Ziyu, Li, Shutao, Sun, Bin, Cai, Jianfei, Long, Zuxiang, Ma, Fuyan

Knowledge-based visual question answering (VQA) requires world knowledge beyond the image for accurate answer. Recently, instead of extra knowledge bases, a large language model (LLM) like GPT-3 is activated as an implicit knowledge engine to jointly

Externí odkaz: http://arxiv.org/abs/2402.02503

Zobrazit plný text záznamu

Report

Hyperspectral Image Fusion via Logarithmic Low-rank Tensor Ring Decomposition

Autor: Zhang, Jun, Zhu, Lipeng, Wang, Chao, Li, Shutao

Integrating a low-spatial-resolution hyperspectral image (LR-HSI) with a high-spatial-resolution multispectral image (HR-MSI) is recognized as a valid method for acquiring HR-HSI. Among the current fusion approaches, the tensor ring (TR) decompositio

Externí odkaz: http://arxiv.org/abs/2310.10044

Zobrazit plný text záznamu

Report

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

Autor: Zhang, Xu, Yang, Kailun, Lin, Jiacheng, Yuan, Jin, Li, Zhiyong, Li, Shutao

The integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation could significantly facilitate user interaction as well as improve interaction efficiency. Most existing studies focus on a single type of v

Externí odkaz: http://arxiv.org/abs/2306.06656

Zobrazit plný text záznamu

Report

AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

Autor: Lin, Jiacheng, Chen, Jiajun, Yang, Kailun, Roitberg, Alina, Li, Siyu, Li, Zhiyong, Li, Shutao

Interactive Image Segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity, notably hindering segme

Externí odkaz: http://arxiv.org/abs/2305.04276

Zobrazit plný text záznamu

Report

LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition

Autor: Ma, Fuyan, Sun, Bin, Li, Shutao

Previous methods for dynamic facial expression recognition (DFER) in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos. Transformer-based methods for DFER can achiev

Externí odkaz: http://arxiv.org/abs/2305.03343

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání