Výsledky vyhledávání - "Basu, Sugato"

Report

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Autor: Li, Jiachen, Feng, Weixi, Fu, Tsu-Jui, Wang, Xinyi, Basu, Sugato, Chen, Wenhu, Wang, William Yang

Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate f

Externí odkaz: http://arxiv.org/abs/2405.18750

Zobrazit plný text záznamu

Report

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Autor: Jia, Zhiwei, Narayana, Pradyumna, Akula, Arjun R., Pruthi, Garima, Su, Hao, Basu, Sugato, Jampani, Varun

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively und

Externí odkaz: http://arxiv.org/abs/2305.18373

Zobrazit plný text záznamu

Report

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Autor: Feng, Weixi, Zhu, Wanrong, Fu, Tsu-jui, Jampani, Varun, Akula, Arjun, He, Xuehai, Basu, Sugato, Wang, Xin Eric, Wang, William Yang

Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we s

Externí odkaz: http://arxiv.org/abs/2305.15393

Zobrazit plný text záznamu

Report

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Autor: He, Xuehai, Feng, Weixi, Fu, Tsu-Jui, Jampani, Varun, Akula, Arjun, Narayana, Pradyumna, Basu, Sugato, Wang, William Yang, Wang, Xin Eric

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text pro

Externí odkaz: http://arxiv.org/abs/2305.10722

Zobrazit plný text záznamu

Report

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Autor: Akula, Arjun R., Driscoll, Brendan, Narayana, Pradyumna, Changpinyo, Soravit, Jia, Zhiwei, Damle, Suyash, Pruthi, Garima, Basu, Sugato, Guibas, Leonidas, Freeman, William T., Li, Yuanzhen, Jampani, Varun

Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such a

Externí odkaz: http://arxiv.org/abs/2212.09898

Zobrazit plný text záznamu

Report

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Autor: Feng, Weixi, He, Xuehai, Fu, Tsu-Jui, Jampani, Varun, Akula, Arjun, Narayana, Pradyumna, Basu, Sugato, Wang, Xin Eric, Wang, William Yang

Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are sti

Externí odkaz: http://arxiv.org/abs/2212.05032

Zobrazit plný text záznamu

Report

CPL: Counterfactual Prompt Learning for Vision and Language Models

Autor: He, Xuehai, Yang, Diji, Feng, Weixi, Fu, Tsu-Jui, Akula, Arjun, Jampani, Varun, Narayana, Pradyumna, Basu, Sugato, Wang, William Yang, Wang, Xin Eric

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, whi

Externí odkaz: http://arxiv.org/abs/2210.10362

Zobrazit plný text záznamu

Report

Diagnosing Vision-and-Language Navigation: What Really Matters

Autor: Zhu, Wanrong, Qi, Yuankai, Narayana, Pradyumna, Sone, Kazoo, Basu, Sugato, Wang, Xin Eric, Wu, Qi, Eckstein, Miguel, Wang, William Yang

Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments. Multiple setups have been proposed, and researchers apply new model architectures or training techniq

Externí odkaz: http://arxiv.org/abs/2103.16561

Zobrazit plný text záznamu

Report

A Framework for Deep Constrained Clustering

Autor: Zhang, Hongjing, Zhan, Tianyang, Basu, Sugato, Davidson, Ian

The area of constrained clustering has been extensively explored by researchers and used by practitioners. Constrained clustering formulations exist for popular algorithms such as k-means, mixture models, and spectral clustering but have several limi

Externí odkaz: http://arxiv.org/abs/2101.02792

Zobrazit plný text záznamu

Report

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Autor: Zhu, Wanrong, Wang, Xin Eric, Narayana, Pradyumna, Sone, Kazoo, Basu, Sugato, Wang, William Yang

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and benchmark

Externí odkaz: http://arxiv.org/abs/2010.03644

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání