Výsledky vyhledávání

Report

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

Autor: Cao, Tingfeng, Wang, Chengyu, Liu, Bingyan, Wu, Ziheng, Zhu, Jinhui, Huang, Jun

Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produc

Externí odkaz: http://arxiv.org/abs/2311.06752

Zobrazit plný text záznamu

Report

Hierarchical Side-Tuning for Vision Transformers

Autor: Lin, Weifeng, Wu, Ziheng, Yang, Wentao, Huang, Mingxin, Huang, Jun, Jin, Lianwen

Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. Yet, the demand for individualized and comprehensive fine-tuning processes for each task entails substantial computational and

Externí odkaz: http://arxiv.org/abs/2310.05393

Zobrazit plný text záznamu

Report

EasyPhoto: Your Smart AI Photo Generator

Autor: Wu, Ziheng, Xu, Jiaqi, Zou, Xinyi, Huang, Kunzhe, Shi, Xing, Huang, Jun

Stable Diffusion web UI (SD-WebUI) is a comprehensive project that provides a browser interface based on Gradio library for Stable Diffusion models. In this paper, We propose a novel WebUI plugin called EasyPhoto, which enables the generation of AI p

Externí odkaz: http://arxiv.org/abs/2310.04672

Zobrazit plný text záznamu

Report

DualToken-ViT: Position-aware Efficient Vision Transformer with Dual Token Fusion

Autor: Chu, Zhenzhen, Chen, Jiayu, Chen, Cen, Wang, Chengyu, Wu, Ziheng, Huang, Jun, Qian, Weining

Self-attention-based vision transformers (ViTs) have emerged as a highly competitive architecture in computer vision. Unlike convolutional neural networks (CNNs), ViTs are capable of global information sharing. With the development of various structu

Externí odkaz: http://arxiv.org/abs/2309.12424

Zobrazit plný text záznamu

Report

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

Recent advancement in personalized image generation have unveiled the intriguing capability of pre-trained text-to-image models on learning identity information from a collection of portrait images. However, existing solutions are vulnerable in produ

Externí odkaz: http://arxiv.org/abs/2308.14256

Zobrazit plný text záznamu

Report

DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

Autor: Duan, Zhongjie, You, Lizhou, Wang, Chengyu, Chen, Cen, Wu, Ziheng, Qian, Weining, Huang, Jun

In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently pr

Externí odkaz: http://arxiv.org/abs/2308.03463

Zobrazit plný text záznamu

Report

Scale-Aware Modulation Meet Transformer

Autor: Lin, Weifeng, Wu, Ziheng, Chen, Jiayu, Huang, Jun, Jin, Lianwen

This paper presents a new vision Transformer, Scale-Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. The proposed Scale-Aware Modulation (SAM) in t

Externí odkaz: http://arxiv.org/abs/2307.08579

Zobrazit plný text záznamu

Report

SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

Autor: Shu, Xinyao, Yan, Shiyang, Yang, Xu, Wu, Ziheng, Chen, Zhongfeng, Lu, Zhenyu

Visual question answering (VQA) is a critical multimodal task in which an agent must answer questions according to the visual cue. Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associati

Externí odkaz: http://arxiv.org/abs/2304.01647

Zobrazit plný text záznamu

Report

YOLOX-PAI: An Improved YOLOX, Stronger and Faster than YOLOv6

Autor: Wu, Ziheng, Zou, Xinyi, Zhou, Wenmeng, Huang, Jun

We develop an all-in-one computer vision toolbox named EasyCV to facilitate the use of various SOTA computer vision methods. Recently, we add YOLOX-PAI, an improved version of YOLOX, into EasyCV. We conduct ablation studies to investigate the influen

Externí odkaz: http://arxiv.org/abs/2208.13040

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání