Zobrazeno 1 - 10
of 949
pro vyhledávání: '"HUANG Yifei"'
SMORE (Chen et al., 2023) recently proposed the concept of semantic regular expressions that extend the classical formalism with a primitive to query external oracles such as databases and large language models (LLMs). Such patterns can be used to id
Externí odkaz:
http://arxiv.org/abs/2410.13262
Recent advances in diffusion-based robot policies have demonstrated significant potential in imitating multi-modal behaviors. However, these approaches typically require large quantities of demonstration data paired with corresponding robot action la
Externí odkaz:
http://arxiv.org/abs/2410.07584
We present a contrastive learning framework based on in-the-wild hand images tailored for pre-training 3D hand pose estimators, dubbed HandCLR. Pre-training on large-scale images achieves promising results in various tasks, but prior 3D hand pose pre
Externí odkaz:
http://arxiv.org/abs/2409.09714
Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in understanding human activities. However, existing RVOS task primarily relies on static attributes such as object names to
Externí odkaz:
http://arxiv.org/abs/2407.07402
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric
Externí odkaz:
http://arxiv.org/abs/2407.06628
Autor:
Pei, Baoqi, Chen, Guo, Xu, Jilan, He, Yuping, Liu, Yicheng, Pan, Kanghua, Huang, Yifei, Wang, Yali, Lu, Tong, Wang, Limin, Qiao, Yu
In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulo
Externí odkaz:
http://arxiv.org/abs/2406.18070
The accuracy of phaseless auxiliary-field quantum Monte Carlo (ph-AFQMC) can be systematically improved with better trial states. Using multi-Slater determinant trial states, ph-AFQMC has the potential to faithfully treat strongly correlated systems,
Externí odkaz:
http://arxiv.org/abs/2406.08314
Autor:
Liang, Tianyi, Liu, Jiangqi, Song, Sicheng, Jiang, Shiqi, Huang, Yifei, Wang, Changbo, Li, Chenhui
Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text
Externí odkaz:
http://arxiv.org/abs/2404.11824
Autor:
Huang, Yifei, Chen, Guo, Xu, Jilan, Zhang, Mingfang, Yang, Lijin, Pei, Baoqi, Zhang, Hongjie, Dong, Lu, Wang, Yali, Wang, Limin, Qiao, Yu
Being able to map the activities of others into one's own point of view is one fundamental human skill even from a very early age. Taking a step toward understanding this human ability, we introduce EgoExoLearn, a large-scale dataset that emulates th
Externí odkaz:
http://arxiv.org/abs/2403.16182
Autor:
Wang, Yi, Li, Kunchang, Li, Xinhao, Yu, Jiashuo, He, Yinan, Wang, Chenting, Chen, Guo, Pei, Baoqi, Yan, Ziang, Zheng, Rongkun, Xu, Jilan, Wang, Zun, Shi, Yansong, Jiang, Tianxiang, Li, Songze, Zhang, Hongjie, Huang, Yifei, Qiao, Yu, Wang, Yali, Wang, Limin
We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue. Our core design is a progressive training approach that unifies th
Externí odkaz:
http://arxiv.org/abs/2403.15377