Výsledky vyhledávání - "Singh, Krishna Kumar"

Report

ActAnywhere: Subject-Aware Video Background Generation

Autor: Pan, Boxiao, Xu, Zhan, Huang, Chun-Hao Paul, Singh, Krishna Kumar, Zhou, Yang, Guibas, Leonidas J., Yang, Jimei

Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves synthesizing background that aligns with the motion and appearance of the foreground

Externí odkaz: http://arxiv.org/abs/2401.10822

Zobrazit plný text záznamu

Report

UniHuman: A Unified Model for Editing Human Images in the Wild

Autor: Li, Nannan, Liu, Qing, Singh, Krishna Kumar, Wang, Yilin, Zhang, Jianming, Plummer, Bryan A., Lin, Zhe

Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning

Externí odkaz: http://arxiv.org/abs/2312.14985

Zobrazit plný text záznamu

Report

Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models

Autor: Bao, Zhipeng, Li, Yijun, Singh, Krishna Kumar, Wang, Yu-Xiong, Hebert, Martial

Despite recent significant strides achieved by diffusion-based Text-to-Image (T2I) models, current systems are still less capable of ensuring decent compositional generation aligned with text prompts, particularly for the multi-object generation. Thi

Externí odkaz: http://arxiv.org/abs/2312.06712

Zobrazit plný text záznamu

Report

Consistent Multimodal Generation via A Unified GAN Framework

Autor: Zhu, Zhen, Li, Yijun, Lyu, Weijie, Singh, Krishna Kumar, Shu, Zhixin, Pirk, Soeren, Hoiem, Derek

We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the

Externí odkaz: http://arxiv.org/abs/2307.01425

Zobrazit plný text záznamu

Report

Putting People in Their Place: Affordance-Aware Human Insertion into Scenes

Autor: Kulal, Sumith, Brooks, Tim, Aiken, Alex, Wu, Jiajun, Yang, Jimei, Lu, Jingwan, Efros, Alexei A., Singh, Krishna Kumar

We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the sce

Externí odkaz: http://arxiv.org/abs/2304.14406

Zobrazit plný text záznamu

Report

Towards Enhanced Controllability of Diffusion Models

Autor: Cho, Wonwoong, Ravi, Hareesh, Harikumar, Midhun, Khuc, Vinh, Singh, Krishna Kumar, Lu, Jingwan, Inouye, David I., Kale, Ajinkya

Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for im

Externí odkaz: http://arxiv.org/abs/2302.14368

Zobrazit plný text záznamu

Report

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

Autor: Ham, Cusuh, Hays, James, Lu, Jingwan, Singh, Krishna Kumar, Zhang, Zhifei, Hinz, Tobias

We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which

Externí odkaz: http://arxiv.org/abs/2302.12764

Zobrazit plný text záznamu

Report

Zero-shot Image-to-Image Translation

Autor: Parmar, Gaurav, Singh, Krishna Kumar, Zhang, Richard, Li, Yijun, Lu, Jingwan, Zhu, Jun-Yan

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard

Externí odkaz: http://arxiv.org/abs/2302.03027

Zobrazit plný text záznamu

Report

UMFuse: Unified Multi View Fusion for Human Editing applications

Autor: Jain, Rishabh, Hemani, Mayur, Ceylan, Duygu, Singh, Krishna Kumar, Lu, Jingwan, Sarkar, Mausoom, Krishnamurthy, Balaji

Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications. However, most of these methods still use an image-to-image formulation in which a single image is given as input to p

Externí odkaz: http://arxiv.org/abs/2211.10157

Zobrazit plný text záznamu

Report

VGFlow: Visibility guided Flow Network for Human Reposing

Autor: Jain, Rishabh, Singh, Krishna Kumar, Hemani, Mayur, Lu, Jingwan, Sarkar, Mausoom, Ceylan, Duygu, Krishnamurthy, Balaji

The task of human reposing involves generating a realistic image of a person standing in an arbitrary conceivable pose. There are multiple difficulties in generating perceptually accurate images, and existing methods suffer from limitations in preser

Externí odkaz: http://arxiv.org/abs/2211.08540

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání