Výsledky vyhledávání - "Gong, Boqing"

Report

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining

Autor: Xian, Ruiqi, Wu, Xiyang, Guan, Tianrui, Wang, Xijun, Gong, Boqing, Manocha, Dinesh

We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs). We incorporate human object knowledge throughout the pretraining process to enhance UAV video pretraining efficiency and

Externí odkaz: http://arxiv.org/abs/2409.18300

Zobrazit plný text záznamu

Report

On Discrete Prompt Optimization for Diffusion Models

Autor: Wang, Ruochen, Liu, Ting, Hsieh, Cho-Jui, Gong, Boqing

Publikováno v: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently

Externí odkaz: http://arxiv.org/abs/2407.01606

Zobrazit plný text záznamu

Report

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Autor: Ban, Yuanhao, Wang, Ruochen, Zhou, Tianyi, Cheng, Minhao, Gong, Boqing, Hsieh, Cho-Jui

The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negat

Externí odkaz: http://arxiv.org/abs/2406.02965

Zobrazit plný text záznamu

Report

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Autor: Ban, Yuanhao, Wang, Ruochen, Zhou, Tianyi, Gong, Boqing, Hsieh, Cho-Jui, Cheng, Minhao

Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that

Externí odkaz: http://arxiv.org/abs/2406.01970

Zobrazit plný text záznamu

Report

Automatic Jailbreaking of the Text-to-Image Generative AI Systems

Autor: Kim, Minseon, Lee, Hyomin, Gong, Boqing, Zhang, Huishuai, Hwang, Sung Ju

Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there a

Externí odkaz: http://arxiv.org/abs/2405.16567

Zobrazit plný text záznamu

Report

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, la

Externí odkaz: http://arxiv.org/abs/2405.12367

Zobrazit plný text záznamu

Report

VideoPrism: A Foundational Visual Encoder for Video Understanding

Autor: Zhao, Long, Gundavarapu, Nitesh B., Yuan, Liangzhe, Zhou, Hao, Yan, Shen, Sun, Jennifer J., Friedman, Luke, Qian, Rui, Weyand, Tobias, Zhao, Yue, Hornung, Rachel, Schroff, Florian, Yang, Ming-Hsuan, Ross, David A., Wang, Huisheng, Adam, Hartwig, Sirotenko, Mikhail, Liu, Ting, Gong, Boqing

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips

Externí odkaz: http://arxiv.org/abs/2402.13217

Zobrazit plný text záznamu

Report

Distilling Vision-Language Models on Millions of Videos

Autor: Zhao, Yue, Zhao, Long, Zhou, Xingyi, Wu, Jialin, Chu, Chun-Te, Miao, Hui, Schroff, Florian, Adam, Hartwig, Liu, Ting, Gong, Boqing, Krähenbühl, Philipp, Yuan, Liangzhe

The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort

Externí odkaz: http://arxiv.org/abs/2401.06129

Zobrazit plný text záznamu

Report

Instruct-Imagen: Image Generation with Multi-modal Instruction

Autor: Hu, Hexiang, Chan, Kelvin C. K., Su, Yu-Chuan, Chen, Wenhu, Li, Yandong, Sohn, Kihyuk, Zhao, Yang, Ben, Xue, Gong, Boqing, Cohen, William, Chang, Ming-Wei, Jia, Xuhui

This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation

Externí odkaz: http://arxiv.org/abs/2401.01952

Zobrazit plný text záznamu

Report

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

Autor: Luo, Calvin, Gong, Boqing, Chen, Ting, Sun, Chen

Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been comparably mu

Externí odkaz: http://arxiv.org/abs/2311.06386

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání