Výsledky vyhledávání

Report

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

Autor: Qian, Guocheng, Wang, Kuan-Chieh, Patashnik, Or, Heravi, Negin, Ostashev, Daniil, Tulyakov, Sergey, Cohen-Or, Daniel, Aberman, Kfir

We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolid

Externí odkaz: http://arxiv.org/abs/2412.09694

Zobrazit plný text záznamu

Report

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

Autor: Zhang, Howard, Alaluf, Yuval, Ma, Sizhuo, Kadambi, Achuta, Wang, Jian, Aberman, Kfir

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing methods often str

Externí odkaz: http://arxiv.org/abs/2412.06753

Zobrazit plný text záznamu

Report

Stable Flow: Vital Layers for Training-Free Image Editing

Autor: Avrahami, Omri, Patashnik, Or, Fried, Ohad, Nemchinov, Egor, Aberman, Kfir, Lischinski, Dani, Cohen-Or, Daniel

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. Howeve

Externí odkaz: http://arxiv.org/abs/2411.14430

Zobrazit plný text záznamu

Report

Efficient Training with Denoised Neural Weights

Autor: Gong, Yifan, Zhan, Zheng, Li, Yanyu, Idelbayev, Yerlan, Zharkov, Andrey, Aberman, Kfir, Tulyakov, Sergey, Wang, Yanzhi, Ren, Jian

Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone

Externí odkaz: http://arxiv.org/abs/2407.11966

Zobrazit plný text záznamu

Report

Interpreting the Weight Space of Customized Diffusion Models

Autor: Dravid, Amil, Gandelsman, Yossi, Wang, Kuan-Chieh, Abdal, Rameen, Wetzstein, Gordon, Efros, Alexei A., Aberman, Kfir

We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual ident

Externí odkaz: http://arxiv.org/abs/2406.09413

Zobrazit plný text záznamu

Report

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Autor: Wang, Kuan-Chieh, Ostashev, Daniil, Fang, Yuwei, Tulyakov, Sergey, Aberman, Kfir

We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload b

Externí odkaz: http://arxiv.org/abs/2404.11565

Zobrazit plný text záznamu

Report

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Autor: Dahary, Omer, Patashnik, Or, Aberman, Kfir, Cohen-Or, Daniel

Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Recently, num

Externí odkaz: http://arxiv.org/abs/2403.16990

Zobrazit plný text záznamu

Report

MyVLM: Personalizing VLMs for User-Specific Queries

Autor: Alaluf, Yuval, Richardson, Elad, Tulyakov, Sergey, Aberman, Kfir, Cohen-Or, Daniel

Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In this work, w

Externí odkaz: http://arxiv.org/abs/2403.14599

Zobrazit plný text záznamu

Report

AToM: Amortized Text-to-Mesh using 2D Diffusion

Autor: Qian, Guocheng, Cao, Junli, Siarohin, Aliaksandr, Kant, Yash, Wang, Chaoyang, Vasilkovsky, Michael, Lee, Hsin-Ying, Fang, Yuwei, Skorokhodov, Ivan, Zhuang, Peiye, Gilitschenski, Igor, Ren, Jian, Ghanem, Bernard, Aberman, Kfir, Tulyakov, Sergey

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously. In contrast to existing text-to-3D methods that often entail time-consuming per-prompt optimization and commonly

Externí odkaz: http://arxiv.org/abs/2402.00867

Zobrazit plný text záznamu

Report

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Autor: Gong, Yifan, Zhan, Zheng, Jin, Qing, Li, Yanyu, Idelbayev, Yerlan, Liu, Xian, Zharkov, Andrey, Aberman, Kfir, Tulyakov, Sergey, Wang, Yanzhi, Ren, Jian

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networ

Externí odkaz: http://arxiv.org/abs/2401.06127

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání