Výsledky vyhledávání

Report

SimGen: Simulator-conditioned Driving Scene Generation

Autor: Zhou, Yunsong, Simon, Michael, Peng, Zhenghao, Mo, Sicheng, Zhu, Hongzi, Guo, Minyi, Zhou, Bolei

Controllable synthetic data generation can substantially lower the annotation cost of training data in autonomous driving research and development. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. Howev

Externí odkaz: http://arxiv.org/abs/2406.09386

Zobrazit plný text záznamu

Report

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Autor: Lin, Kuan Heng, Mo, Sicheng, Klingher, Ben, Mu, Fangzhou, Zhou, Bolei

Recent controllable generation approaches such as FreeControl and Diffusion Self-guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize th

Externí odkaz: http://arxiv.org/abs/2406.07540

Zobrazit plný text záznamu

Report

SnAG: Scalable and Accurate Video Grounding

Autor: Mu, Fangzhou, Mo, Sicheng, Li, Yin

Temporal grounding of text descriptions in videos is a central problem in vision-language learning and video understanding. Existing methods often prioritize accuracy over scalability -- they have been optimized for grounding only a few text queries

Externí odkaz: http://arxiv.org/abs/2404.02257

Zobrazit plný text záznamu

Report

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Autor: Mo, Sicheng, Mu, Fangzhou, Lin, Kuan Heng, Liu, Yanli, Guan, Bochen, Li, Yin, Zhou, Bolei

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting the

Externí odkaz: http://arxiv.org/abs/2312.07536

Zobrazit plný text záznamu

Report

Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge

Autor: Mu, Fangzhou, Mo, Sicheng, Wang, Gillian, Li, Yin

This report describes our submission to the Ego4D Moment Queries Challenge 2022. Our submission builds on ActionFormer, the state-of-the-art backbone for temporal action localization, and a trio of strong video features from SlowFast, Omnivore and Eg

Externí odkaz: http://arxiv.org/abs/2211.09074

Zobrazit plný text záznamu

Report

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Autor: Mo, Sicheng, Mu, Fangzhou, Li, Yin

This report describes Badgers@UW-Madison, our submission to the Ego4D Natural Language Queries (NLQ) Challenge. Our solution inherits the point-based event representation from our prior work on temporal action localization, and develops a Transformer

Externí odkaz: http://arxiv.org/abs/2211.08704

Zobrazit plný text záznamu

Report

Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging

Autor: Mu, Fangzhou, Mo, Sicheng, Peng, Jiayong, Liu, Xiaochun, Nam, Ji Hyun, Raghavan, Siddeshwar, Velten, Andreas, Li, Yin

Computational approach to imaging around the corner, or non-line-of-sight (NLOS) imaging, is becoming a reality thanks to major advances in imaging hardware and reconstruction algorithms. A recent development towards practical NLOS imaging, Nam et al

Externí odkaz: http://arxiv.org/abs/2205.01679

Zobrazit plný text záznamu

Akademický článek

Transformer-based automated segmentation of recycling materials for semantic understanding in construction

Autor: Wang, Xin, Han, Wei, Mo, Sicheng, Cai, Ting, Gong, Yijing, Li, Yin, Zhu, Zhenhua

Publikováno v: In Automation in Construction October 2023 154

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání