Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Park, Dong Huk"'
Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread
Externí odkaz:
http://arxiv.org/abs/2305.14334
Autor:
Park, Dong Huk, Luo, Grace, Toste, Clayton, Azadi, Samaneh, Liu, Xihui, Karalashvili, Maka, Rohrbach, Anna, Darrell, Trevor
We introduce precise object silhouette as a new form of user control in text-to-image diffusion models, which we dub Shape-Guided Diffusion. Our training-free method uses an Inside-Outside Attention mechanism during the inversion and generation proce
Externí odkaz:
http://arxiv.org/abs/2212.00210
Autor:
Liu, Xihui, Park, Dong Huk, Azadi, Samaneh, Zhang, Gong, Chopikyan, Arman, Hu, Yuxiao, Shi, Humphrey, Rohrbach, Anna, Darrell, Trevor
Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than prior metho
Externí odkaz:
http://arxiv.org/abs/2112.05744
Autor:
Li, Xuanlin, Trabucco, Brandon, Park, Dong Huk, Luo, Michael, Shen, Sheng, Darrell, Trevor, Gao, Yang
The predominant approach for language modeling is to process sequences from left to right, but this eliminates a source of information: the order by which the sequence was generated. One strategy to recover this information is to decode both the cont
Externí odkaz:
http://arxiv.org/abs/2110.15797
Large-scale pretraining of visual representations has led to state-of-the-art performance on a range of benchmark computer vision tasks, yet the benefits of these techniques at extreme scale in complex production systems has been relatively unexplore
Externí odkaz:
http://arxiv.org/abs/2108.05887
Transformers have become the dominant model in natural language processing, owing to their ability to pretrain on massive amounts of data, then transfer to smaller, more specific tasks via fine-tuning. The Vision Transformer was the first major attem
Externí odkaz:
http://arxiv.org/abs/2012.09958
At Pinterest, we utilize image embeddings throughout our search and recommendation systems to help our users navigate through visual content by powering experiences like browsing of related content and searching for exact products for shopping. In th
Externí odkaz:
http://arxiv.org/abs/1908.01707
Describing what has changed in a scene can be useful to a user, but only if generated text focuses on what is semantically relevant. It is thus important to distinguish distractors (e.g. a viewpoint change) from relevant changes (e.g. an object has m
Externí odkaz:
http://arxiv.org/abs/1901.02527
Autor:
Park, Dong Huk, Hendricks, Lisa Anne, Akata, Zeynep, Rohrbach, Anna, Schiele, Bernt, Darrell, Trevor, Rohrbach, Marcus
Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We pr
Externí odkaz:
http://arxiv.org/abs/1802.08129
Autor:
Park, Dong Huk, Hendricks, Lisa Anne, Akata, Zeynep, Rohrbach, Anna, Schiele, Bernt, Darrell, Trevor, Rohrbach, Marcus
Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasiz
Externí odkaz:
http://arxiv.org/abs/1711.07373