Výsledky vyhledávání - "Qian, Shengju"

Report

Text-Animator: Controllable Visual Text Video Generation

Autor: Liu, Lin, Liu, Quande, Qian, Shengju, Zhou, Yuan, Zhou, Wengang, Li, Houqiang, Xie, Lingxi, Tian, Qi

Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress a

Externí odkaz: http://arxiv.org/abs/2406.17777

Zobrazit plný text záznamu

Report

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Autor: He, Xuanhua, Liu, Quande, Qian, Shengju, Wang, Xin, Hu, Tao, Cao, Ke, Yan, Keyu, Zhang, Jie

Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, e

Externí odkaz: http://arxiv.org/abs/2404.15275

Zobrazit plný text záznamu

Report

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Autor: Shao, Hao, Qian, Shengju, Xiao, Han, Song, Guanglu, Zong, Zhuofan, Wang, Letian, Liu, Yu, Li, Hongsheng

Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or w

Externí odkaz: http://arxiv.org/abs/2403.16999

Zobrazit plný text záznamu

Report

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Autor: Zhang, Yuechen, Qian, Shengju, Peng, Bohao, Liu, Shu, Jia, Jiaya

This study targets a critical aspect of multi-modal LLMs' (LLMs&VLMs) inference: explicit controllable text generation. Multi-modal LLMs empower multi-modality understanding with the capability of semantic generation yet bring less explainability and

Externí odkaz: http://arxiv.org/abs/2312.04302

Zobrazit plný text záznamu

Report

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Autor: Chen, Yukang, Qian, Shengju, Tang, Haotian, Lai, Xin, Liu, Zhijian, Han, Song, Jia, Jiaya

We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring

Externí odkaz: http://arxiv.org/abs/2309.12307

Zobrazit plný text záznamu

Report

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation

Autor: Li, Jingyao, Chen, Pengguang, Qian, Shengju, Liu, Shu, Jia, Jiaya

Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks. However, existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels

Externí odkaz: http://arxiv.org/abs/2304.07547

Zobrazit plný text záznamu

Report

StraIT: Non-autoregressive Generation with Stratified Image Transformer

Autor: Qian, Shengju, Chang, Huiwen, Li, Yuanzhen, Zhang, Zizhao, Jia, Jiaya, Zhang, Han

We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs). In contrast to the under-exploi

Externí odkaz: http://arxiv.org/abs/2303.00750

Zobrazit plný text záznamu

Report

What Makes for Good Tokenizers in Vision Transformer?

Autor: Qian, Shengju, Zhu, Yi, Li, Wenbo, Li, Mu, Jia, Jiaya

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm. Relying on the tokenization process that splits inputs into multiple tokens, transformers are ca

Externí odkaz: http://arxiv.org/abs/2212.11115

Zobrazit plný text záznamu

Report

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Autor: Li, Wenbo, Lu, Xin, Qian, Shengju, Lu, Jiangbo, Zhang, Xiangyu, Jia, Jiaya

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes

Externí odkaz: http://arxiv.org/abs/2112.10175

Zobrazit plný text záznamu

Report

Blending Anti-Aliasing into Vision Transformer

Autor: Qian, Shengju, Shao, Hao, Zhu, Yi, Li, Mu, Jia, Jiaya

The transformer architectures, based on self-attention mechanism and convolution-free design, recently found superior performance and booming applications in computer vision. However, the discontinuous patch-wise tokenization process implicitly intro

Externí odkaz: http://arxiv.org/abs/2110.15156

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání