Výsledky vyhledávání

Report

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Autor: Hu, Siyuan, Ouyang, Mingyu, Gao, Difei, Shou, Mike Zheng

The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment rem

Externí odkaz: http://arxiv.org/abs/2411.10323

Zobrazit plný text záznamu

Report

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Autor: Zhang, David Junhao, Paiss, Roni, Zada, Shiran, Karnad, Nikhil, Jacobs, David E., Pritch, Yael, Mosseri, Inbar, Shou, Mike Zheng, Wadhwa, Neal, Ruiz, Nataniel

Recently, breakthroughs in video modeling have allowed for controllable camera trajectories in generated videos. However, these methods cannot be directly applied to user-provided videos that are not generated by a video model. In this paper, we pres

Externí odkaz: http://arxiv.org/abs/2411.05003

Zobrazit plný text záznamu

Report

Skinned Motion Retargeting with Dense Geometric Interaction Perception

Autor: Ye, Zijie, Liu, Jia-Wei, Jia, Jia, Sun, Shikun, Shou, Mike Zheng

Capturing and maintaining geometric interactions among different body parts is crucial for successful motion retargeting in skinned characters. Existing approaches often overlook body geometries or add a geometry correction stage after skeletal motio

Externí odkaz: http://arxiv.org/abs/2410.20986

Zobrazit plný text záznamu

Report

ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model

Autor: Xu, Hongbin, Chen, Weitao, Zhou, Zhipeng, Xiao, Feng, Sun, Baigui, Shou, Mike Zheng, Kang, Wenxiong

Despite recent advancements in 3D generation methods, achieving controllability still remains a challenging issue. Current approaches utilizing score-distillation sampling are hindered by laborious procedures that consume a significant amount of time

Externí odkaz: http://arxiv.org/abs/2410.09592

Zobrazit plný text záznamu

Report

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Autor: Zhao, Rui, Yuan, Hangjie, Wei, Yujie, Zhang, Shiwei, Gu, Yuchao, Ran, Lingmin, Wang, Xiang, Wu, Zhangjie, Zhang, Junhao, Zhang, Yingya, Shou, Mike Zheng

Recent advancements in generation models have showcased remarkable capabilities in generating fantastic content. However, most of them are trained on proprietary high-quality data, and some models withhold their parameters and only provide accessible

Externí odkaz: http://arxiv.org/abs/2410.07133

Zobrazit plný text záznamu

Report

Image Watermarks are Removable Using Controllable Regeneration from Clean Noise

Autor: Liu, Yepeng, Song, Yiren, Ci, Hai, Zhang, Yu, Wang, Haofan, Shou, Mike Zheng, Bu, Yuheng

Image watermark techniques provide an effective way to assert ownership, deter misuse, and trace content sources, which has become increasingly essential in the era of large generative models. A critical attribute of watermark techniques is their rob

Externí odkaz: http://arxiv.org/abs/2410.05470

Zobrazit plný text záznamu

Report

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos

Autor: Wang, Ziyu, Han, Shuangpeng, Shou, Mike Zheng, Zhang, Mengmi

A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this work, we introduce the challenge of unsupervised prior learning in pose estimation, where AI models learn pose priors of animate objects

Externí odkaz: http://arxiv.org/abs/2410.03858

Zobrazit plný text záznamu

Report

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Autor: Bai, Zechen, He, Tong, Mei, Haiyang, Wang, Pichao, Gao, Ziteng, Chen, Joya, Liu, Lei, Zhang, Zheng, Shou, Mike Zheng

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augm

Externí odkaz: http://arxiv.org/abs/2409.19603

Zobrazit plný text záznamu

Report

High Quality Human Image Animation using Regional Supervision and Motion Blur Condition

Autor: Xu, Zhongcong, Song, Chaoyue, Song, Guoxian, Zhang, Jianfeng, Liew, Jun Hao, Xu, Hongyi, Xie, You, Luo, Linjie, Lin, Guosheng, Feng, Jiashi, Shou, Mike Zheng

Recent advances in video diffusion models have enabled realistic and controllable human image animation with temporal coherence. Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial area

Externí odkaz: http://arxiv.org/abs/2409.19580

Zobrazit plný text záznamu

Report

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Autor: Han, Zongbo, Yang, Jialong, Li, Junfan, Hu, Qinghua, Xu, Qianli, Shou, Mike Zheng, Zhang, Changqing

Vision-language foundation models (e.g., CLIP) have shown remarkable performance across a wide range of tasks. However, deploying these models may be unreliable when significant distribution gaps exist between the training and test data. The training

Externí odkaz: http://arxiv.org/abs/2409.19375

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání