Výsledky vyhledávání - "Sun, Xiaoshuai"

Report

Evaluating and Analyzing Relationship Hallucinations in LVLMs

Autor: Wu, Mingrui, Ji, Jiayi, Huang, Oucheng, Li, Jiale, Wu, Yuhang, Sun, Xiaoshuai, Ji, Rongrong

The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. Howev

Externí odkaz: http://arxiv.org/abs/2406.16449

Zobrazit plný text záznamu

Report

AnyTrans: Translate AnyText in the Image with Large Scale Models

Autor: Qian, Zhipeng, Zhang, Pei, Yang, Baosong, Fan, Kai, Ma, Yiwei, Wong, Derek F., Sun, Xiaoshuai, Ji, Rongrong

This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images. Our framework leverages the strengths of large-scale models,

Externí odkaz: http://arxiv.org/abs/2406.11432

Zobrazit plný text záznamu

Report

Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval

Autor: Ma, Yiwei, Sun, Xiaoshuai, Ji, Jiayi, Jiang, Guannan, Zhuang, Weilin, Ji, Rongrong

Text-based person retrieval (TPR) is a challenging task that involves retrieving a specific individual based on a textual description. Despite considerable efforts to bridge the gap between vision and language, the significant differences between the

Externí odkaz: http://arxiv.org/abs/2406.05620

Zobrazit plný text záznamu

Report

SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

Autor: Yang, Danni, Ji, Jiayi, Ma, Yiwei, Guo, Tianyu, Wang, Haowei, Sun, Xiaoshuai, Ji, Rongrong

In this paper, we introduce SemiRES, a semi-supervised framework that effectively leverages a combination of labeled and unlabeled data to perform RES. A significant hurdle in applying semi-supervised techniques to RES is the prevalence of noisy pseu

Externí odkaz: http://arxiv.org/abs/2406.01451

Zobrazit plný text záznamu

Report

Image Captioning via Dynamic Path Customization

Autor: Ma, Yiwei, Ji, Jiayi, Sun, Xiaoshuai, Zhou, Yiyi, Hong, Xiaopeng, Wu, Yongjian, Ji, Rongrong

This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs. Most previous state-of-the-art approaches are static and hand-crafted networks, which not only hea

Externí odkaz: http://arxiv.org/abs/2406.00334

Zobrazit plný text záznamu

Report

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

Autor: Ma, Yiwei, Lin, Zhekai, Ji, Jiayi, Fan, Yijun, Sun, Xiaoshuai, Ji, Rongrong

Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progress

Externí odkaz: http://arxiv.org/abs/2405.00954

Zobrazit plný text záznamu

Report

Deep Instruction Tuning for Segment Anything Model

Autor: Huang, Xiaorui, Luo, Gen, Zhu, Chaoyang, Tong, Bo, Zhou, Yiyi, Sun, Xiaoshuai, Ji, Rongrong

Recently, Segment Anything Model (SAM) has become a research hotspot in the fields of multimedia and computer vision, which exhibits powerful yet versatile capabilities on various (un) conditional image segmentation tasks. Although SAM can support di

Externí odkaz: http://arxiv.org/abs/2404.00650

Zobrazit plný text záznamu

Report

DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis

Autor: Chen, Zhongxi, Sun, Ke, Zhou, Ziyin, Lin, Xianming, Sun, Xiaoshuai, Cao, Liujuan, Ji, Rongrong

The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks. Existing face forgery datasets have limitations in generating high-quality facial images a

Externí odkaz: http://arxiv.org/abs/2403.18471

Zobrazit plný text záznamu

Report

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models

Autor: Wu, Qiong, Ye, Weihao, Zhou, Yiyi, Sun, Xiaoshuai, Ji, Rongrong

In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS). Concretely, we first reveal that multi-head attentions (MHAs), the main comp

Externí odkaz: http://arxiv.org/abs/2403.15226

Zobrazit plný text záznamu

Report

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

Autor: Zhang, Jinlu, Zhou, Yiyi, Zheng, Qiancheng, Du, Xiaoxiong, Luo, Gen, Peng, Jun, Sun, Xiaoshuai, Ji, Rongrong

Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality. In this paper, we propose an End-to-End Efficient and Effective network for f

Externí odkaz: http://arxiv.org/abs/2403.06702

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání