Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Lin, Haitao"'
This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their
Externí odkaz:
http://arxiv.org/abs/2408.07975
Autor:
Zhang, Jinyu, Gu, Yongchong, Gao, Jianxiong, Lin, Haitao, Sun, Qiang, Sun, Xinwei, Xue, Xiangyang, Fu, Yanwei
This paper addresses the challenge of perceiving complete object shapes through visual perception. While prior studies have demonstrated encouraging outcomes in segmenting the visible parts of objects within a scene, amodal segmentation, in particula
Externí odkaz:
http://arxiv.org/abs/2408.03238
Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive
Externí odkaz:
http://arxiv.org/abs/2307.02716
Most existing works solving Room-to-Room VLN problem only utilize RGB images and do not consider local context around candidate views, which lack sufficient visual cues about surrounding environment. Moreover, natural language contains complex semant
Externí odkaz:
http://arxiv.org/abs/2305.17102
Autor:
Li, Siyuan, Wang, Zedong, Liu, Zicheng, Tan, Cheng, Lin, Haitao, Wu, Di, Chen, Zhiyuan, Zheng, Jiangbin, Li, Stan Z.
By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on \textit{multi-order game-theoretic interaction} within deep neural networks (DNNs) reveals the repre
Externí odkaz:
http://arxiv.org/abs/2211.03295
This paper studies the task of any objects grasping from the known categories by free-form language instructions. This task demands the technique in computer vision, natural language processing, and robotics. We bring these disciplines together on th
Externí odkaz:
http://arxiv.org/abs/2205.04028
In this paper, we are interested in the problem of generating target grasps by understanding freehand sketches. The sketch is useful for the persons who cannot formulate language and the cases where a textual description is not available on the fly.
Externí odkaz:
http://arxiv.org/abs/2205.04026
Given a single scene image, this paper proposes a method of Category-level 6D Object Pose and Size Estimation (COPSE) from the point cloud of the target object, without external real pose-annotated training data. Specifically, beyond the visual cues
Externí odkaz:
http://arxiv.org/abs/2106.14193
Autor:
Wang, Jiashun, Wen, Chao, Fu, Yanwei, Lin, Haitao, Zou, Tianyun, Xue, Xiangyang, Zhang, Yinda
Pose transfer has been studied for decades, in which the pose of a source mesh is applied to a target mesh. Particularly in this paper, we are interested in transferring the pose of source human mesh to deform the target human mesh, while the source
Externí odkaz:
http://arxiv.org/abs/2003.07254
Publikováno v:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Given a single scene image, this paper proposes a method of Category-level 6D Object Pose and Size Estimation (COPSE) from the point cloud of the target object, without external real pose-annotated training data. Specifically, beyond the visual cues