Zobrazeno 1 - 10
of 454
pro vyhledávání: '"He, Tianyu"'
Autor:
Chen, Xiaoyu, Guo, Junliang, He, Tianyu, Zhang, Chuheng, Zhang, Pushi, Yang, Derek Cathera, Zhao, Li, Bian, Jiang
We introduce Image-GOal Representations (IGOR), aiming to learn a unified, semantically consistent action space across human and various robots. Through this unified latent action space, IGOR enables knowledge transfer among large-scale robot and hum
Externí odkaz:
http://arxiv.org/abs/2411.00785
Publikováno v:
NeurIPS 2024
Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain in precisely controlling individual concepts within the generated video
Externí odkaz:
http://arxiv.org/abs/2409.00558
Autor:
Jiang, Meng, Zhao, Qing, Li, Jianqiang, Wang, Fan, He, Tianyu, Cheng, Xinyan, Yang, Bing Xiang, Ho, Grace W. K., Fu, Guanghui
Cognitive Behavioral Therapy (CBT) is a well-established intervention for mitigating psychological issues by modifying maladaptive cognitive and behavioral patterns. However, delivery of CBT is often constrained by resource limitations and barriers t
Externí odkaz:
http://arxiv.org/abs/2407.19422
In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided by demonstrations. In this paper, we propose and stud
Externí odkaz:
http://arxiv.org/abs/2407.07356
Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input vie
Externí odkaz:
http://arxiv.org/abs/2406.10111
Autor:
Yu, Runyi, He, Tianyu, Zhang, Ailing, Wang, Yuchi, Guo, Junliang, Tan, Xu, Liu, Chang, Chen, Jie, Bian, Jiang
We aim to edit the lip movements in talking video according to the given speech while preserving the personal identity and visual details. The task can be decomposed into two sub-problems: (1) speech-driven lip motion generation and (2) visual appear
Externí odkaz:
http://arxiv.org/abs/2406.08096
Neural networks readily learn a subset of the modular arithmetic tasks, while failing to generalize on the rest. This limitation remains unmoved by the choice of architecture and training strategies. On the other hand, an analytical solution for the
Externí odkaz:
http://arxiv.org/abs/2406.03495
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in
Externí odkaz:
http://arxiv.org/abs/2406.02550
Autor:
Wang, Yuchi, Guo, Junliang, Bai, Jianhong, Yu, Runyi, He, Tianyu, Tan, Xu, Sun, Xu, Bian, Jiang
Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated
Externí odkaz:
http://arxiv.org/abs/2405.15758
3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we for
Externí odkaz:
http://arxiv.org/abs/2406.01597