Zobrazeno 1 - 10
of 115
pro vyhledávání: '"Ni, Minheng"'
Unthinking execution of human instructions in robotic manipulation can lead to severe safety risks, such as poisonings, fires, and even explosions. In this paper, we present responsible robotic manipulation, which requires robots to consider potentia
Externí odkaz:
http://arxiv.org/abs/2411.18289
As large-scale models evolve, language instructions are increasingly utilized in multi-modal tasks. Due to human language habits, these instructions often contain ambiguities in real-world scenarios, necessitating the integration of visual context or
Externí odkaz:
http://arxiv.org/abs/2410.03321
Autor:
Ni, Minheng, Wu, Chenfei, Yuan, Huaying, Yang, Zhengyuan, Gong, Ming, Wang, Lijuan, Liu, Zicheng, Zuo, Wangmeng, Duan, Nan
With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting
Externí odkaz:
http://arxiv.org/abs/2408.11564
With recent advancements in visual synthesis, there is a growing risk of encountering images with detrimental effects, such as hate, discrimination, or privacy violations. The research on transforming harmful images into responsible ones remains unex
Externí odkaz:
http://arxiv.org/abs/2404.05580
Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly focus on usin
Externí odkaz:
http://arxiv.org/abs/2308.16777
Autor:
Ni, Minheng, Wu, Chenfei, Wang, Xiaodong, Yin, Shengming, Wang, Lijuan, Liu, Zicheng, Duan, Nan
Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and
Externí odkaz:
http://arxiv.org/abs/2308.13785
Autor:
Yin, Shengming, Wu, Chenfei, Yang, Huan, Wang, Jianfeng, Wang, Xiaodong, Ni, Minheng, Yang, Zhengyuan, Li, Linjie, Liu, Shuguang, Yang, Fan, Fu, Jianlong, Ming, Gong, Wang, Lijuan, Liu, Zicheng, Li, Houqiang, Duan, Nan
In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation. Most current work generates long videos segment by segment sequentially, which normally leads to the gap between training on short v
Externí odkaz:
http://arxiv.org/abs/2303.12346
Autor:
Wang, Xiaodong, Wu, Chenfei, Yin, Shengming, Ni, Minheng, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Yang, Fan, Wang, Lijuan, Liu, Zicheng, Fang, Yuejian, Duan, Nan
3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an
Externí odkaz:
http://arxiv.org/abs/2302.10781
Without the demand of training in reality, humans can easily detect a known concept simply based on its language description. Empowering deep learning with this ability undoubtedly enables the neural network to handle complex vision tasks, e.g., obje
Externí odkaz:
http://arxiv.org/abs/2210.06886
Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of d
Externí odkaz:
http://arxiv.org/abs/2202.05009