Zobrazeno 1 - 10
of 556
pro vyhledávání: '"Zeng Wenjun"'
Camera-based 3D Semantic Occupancy Prediction (SOP) is crucial for understanding complex 3D scenes from limited 2D image observations. Existing SOP methods typically aggregate contextual features to assist the occupancy representation learning, allev
Externí odkaz:
http://arxiv.org/abs/2412.08243
Autor:
Hahn, Meera, Zeng, Wenjun, Kannen, Nithish, Galt, Rich, Badola, Kartikeya, Kim, Been, Wang, Zi
User prompts for generative AI models are often underspecified, leading to sub-optimal responses. This problem is particularly evident in text-to-image (T2I) generation, where users commonly struggle to articulate their precise intent. This disconnec
Externí odkaz:
http://arxiv.org/abs/2412.06771
Autor:
Li, Bohan, Guo, Jiazhe, Liu, Hongsi, Zou, Yingshuang, Ding, Yikang, Chen, Xiwu, Zhu, Hu, Tan, Feiyang, Zhang, Chi, Wang, Tiancai, Zhou, Shuchang, Zhang, Li, Qi, Xiaojuan, Zhao, Hao, Yang, Mu, Zeng, Wenjun, Jin, Xin
Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms require
Externí odkaz:
http://arxiv.org/abs/2412.05435
Autor:
Xu, Liang, Hua, Shaoyang, Lin, Zili, Liu, Yifan, Ma, Feipeng, Yan, Yichao, Jin, Xin, Yang, Xiaokang, Zeng, Wenjun
In this paper, we tackle the problem of how to build and benchmark a large motion model (LMM). The ultimate goal of LMM is to serve as a foundation model for versatile motion-related tasks, e.g., human motion generation, with interpretability and gen
Externí odkaz:
http://arxiv.org/abs/2410.13790
Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges. While various model-based methods have improved sample efficiency by learning interactive world models, these agents tend to be "short-sig
Externí odkaz:
http://arxiv.org/abs/2410.03618
Autor:
Wang, Yunnan, Li, Ziqiang, Zhang, Zequn, Zhang, Wenyao, Xie, Baao, Liu, Xihui, Zeng, Wenjun, Jin, Xin
There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To
Externí odkaz:
http://arxiv.org/abs/2410.00447
Autor:
Liu, Jinming, Wei, Yuntao, Lin, Junyan, Zhao, Shengyang, Sun, Heming, Chen, Zhibo, Zeng, Wenjun, Jin, Xin
We present a new image compression paradigm to achieve ``intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models are powerful
Externí odkaz:
http://arxiv.org/abs/2408.08575
Autor:
Zeng, Wenjun, Liu, Yuchi, Mullins, Ryan, Peran, Ludovic, Fernandez, Joe, Harkous, Hamza, Narasimhan, Karthik, Proud, Drew, Kumar, Piyush, Radharapu, Bhaktipriya, Sturman, Olivia, Wahltinez, Oscar
We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation models built upon Gemma2. These models provide robust, state-of-the-art predictions of safety risks across key harm types (sexually explicit, dangerous content, hara
Externí odkaz:
http://arxiv.org/abs/2407.21772
Disentangled representation learning (DRL) aims to identify and decompose underlying factors behind observations, thus facilitating data perception and generation. However, current DRL approaches often rely on the unrealistic assumption that semantic
Externí odkaz:
http://arxiv.org/abs/2407.18999
Autor:
Lv, Xintao, Xu, Liang, Yan, Yichao, Jin, Xin, Xu, Congsheng, Wu, Shuwen, Liu, Yifan, Li, Lincheng, Bi, Mengxiao, Zeng, Wenjun, Yang, Xiaokang
Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objec
Externí odkaz:
http://arxiv.org/abs/2407.12371