Výsledky vyhledávání

Report

A Survey on Hallucination in Large Vision-Language Models

Autor: Liu, Hanchao, Xue, Wenyuan, Chen, Yifei, Chen, Dapeng, Zhao, Xiutian, Wang, Ke, Hou, Liping, Li, Rongjun, Peng, Wei

Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual c

Externí odkaz: http://arxiv.org/abs/2402.00253

Zobrazit plný text záznamu

Report

Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition

Autor: Chen, Yifei, Chen, Dapeng, Liu, Ruijin, Zhou, Sai, Xue, Wenyuan, Peng, Wei

Large-scale visual-language pre-trained models have achieved significant success in various video tasks. However, most existing methods follow an "adapt then align" paradigm, which adapts pre-trained image encoders to model video-level representation

Externí odkaz: http://arxiv.org/abs/2311.15619

Zobrazit plný text záznamu

Report

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Autor: Liu, Ruijin, Lu, Ning, Chen, Dapeng, Li, Cheng, Yuan, Zejian, Peng, Wei

We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right si

Externí odkaz: http://arxiv.org/abs/2308.15004

Zobrazit plný text záznamu

Report

ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition

Autor: Xue, Wenyuan, Chen, Dapeng, Yu, Baosheng, Chen, Yifei, Zhou, Sai, Peng, Wei

Visual chart recognition systems are gaining increasing attention due to the growing demand for automatically identifying table headers and values from chart images. Current methods rely on keypoint detection to estimate data element shapes in charts

Externí odkaz: http://arxiv.org/abs/2308.07743

Zobrazit plný text záznamu

Report

Video Action Recognition with Attentive Semantic Units

Autor: Chen, Yifei, Chen, Dapeng, Liu, Ruijin, Li, Hao, Peng, Wei

Publikováno v: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 10170-10180

Visual-Language Models (VLMs) have significantly advanced action video recognition. Supervised by the semantics of action labels, recent works adapt the visual branch of VLMs to learn video representations. Despite the effectiveness proved by these w

Externí odkaz: http://arxiv.org/abs/2303.09756

Zobrazit plný text záznamu

Report

Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

Autor: Huang, Yongshuai, Lu, Ning, Chen, Dapeng, Li, Yibo, Xie, Zecheng, Zhu, Shenggao, Gao, Liangcai, Peng, Wei

Table structure recognition aims to extract the logical and physical structure of unstructured table images into a machine-readable format. The latest end-to-end image-to-text approaches simultaneously predict the two structures by two decoders, wher

Externí odkaz: http://arxiv.org/abs/2303.06949

Zobrazit plný text záznamu

Report

FNeVR: Neural Volume Rendering for Face Animation

Autor: Zeng, Bohan, Liu, Boyu, Li, Hong, Liu, Xuhui, Liu, Jianzhuang, Chen, Dapeng, Peng, Wei, Zhang, Baochang

Face animation, one of the hottest topics in computer vision, has achieved a promising performance with the help of generative models. However, it remains a critical challenge to generate identity preserving and photo-realistic images due to the soph

Externí odkaz: http://arxiv.org/abs/2209.10340

Zobrazit plný text záznamu

Report

Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

Autor: Wu, Lin, Liu, Deyin, Zhang, Wenying, Chen, Dapeng, Ge, Zongyuan, Boussaid, Farid, Bennamoun, Mohammed, Shen, Jialie

Publikováno v: IEEE Transactions on Image Processing 2022

Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples

Externí odkaz: http://arxiv.org/abs/2207.13035

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání