Zobrazeno 1 - 10
of 208
pro vyhledávání: '"Mike, Zheng"'
The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment rem
Externí odkaz:
http://arxiv.org/abs/2411.10323
Autor:
Zhang, David Junhao, Paiss, Roni, Zada, Shiran, Karnad, Nikhil, Jacobs, David E., Pritch, Yael, Mosseri, Inbar, Shou, Mike Zheng, Wadhwa, Neal, Ruiz, Nataniel
Recently, breakthroughs in video modeling have allowed for controllable camera trajectories in generated videos. However, these methods cannot be directly applied to user-provided videos that are not generated by a video model. In this paper, we pres
Externí odkaz:
http://arxiv.org/abs/2411.05003
Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motio
Externí odkaz:
http://arxiv.org/abs/2410.20986
Autor:
Xu, Hongbin, Chen, Weitao, Zhou, Zhipeng, Xiao, Feng, Sun, Baigui, Shou, Mike Zheng, Kang, Wenxiong
Despite recent advancements in 3D generation methods, achieving controllability still remains a challenging issue. Current approaches utilizing score-distillation sampling are hindered by laborious procedures that consume a significant amount of time
Externí odkaz:
http://arxiv.org/abs/2410.09592
Autor:
Zhao, Rui, Yuan, Hangjie, Wei, Yujie, Zhang, Shiwei, Gu, Yuchao, Ran, Lingmin, Wang, Xiang, Wu, Zhangjie, Zhang, Junhao, Zhang, Yingya, Shou, Mike Zheng
Recent advancements in generation models have showcased remarkable capabilities in generating fantastic content. However, most of them are trained on proprietary high-quality data, and some models withhold their parameters and only provide accessible
Externí odkaz:
http://arxiv.org/abs/2410.07133
Image watermark techniques provide an effective way to assert ownership, deter misuse, and trace content sources, which has become increasingly essential in the era of large generative models. A critical attribute of watermark techniques is their rob
Externí odkaz:
http://arxiv.org/abs/2410.05470
A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this work, we introduce the challenge of unsupervised prior learning in pose estimation, where AI models learn pose priors of animate objects
Externí odkaz:
http://arxiv.org/abs/2410.03858
Autor:
Bai, Zechen, He, Tong, Mei, Haiyang, Wang, Pichao, Gao, Ziteng, Chen, Joya, Liu, Lei, Zhang, Zheng, Shou, Mike Zheng
We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augm
Externí odkaz:
http://arxiv.org/abs/2409.19603
Autor:
Xu, Zhongcong, Song, Chaoyue, Song, Guoxian, Zhang, Jianfeng, Liew, Jun Hao, Xu, Hongyi, Xie, You, Luo, Linjie, Lin, Guosheng, Feng, Jiashi, Shou, Mike Zheng
Recent advances in video diffusion models have enabled realistic and controllable human image animation with temporal coherence. Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial area
Externí odkaz:
http://arxiv.org/abs/2409.19580
Autor:
Han, Zongbo, Yang, Jialong, Li, Junfan, Hu, Qinghua, Xu, Qianli, Shou, Mike Zheng, Zhang, Changqing
Vision-language foundation models (e.g., CLIP) have shown remarkable performance across a wide range of tasks. However, deploying these models may be unreliable when significant distribution gaps exist between the training and test data. The training
Externí odkaz:
http://arxiv.org/abs/2409.19375