Výsledky vyhledávání

Report

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models

Autor: Wang, Tianyu, Lin, Haitao, Yu, Junqiu, Fu, Yanwei

This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their

Externí odkaz: http://arxiv.org/abs/2408.07975

Zobrazit plný text záznamu

Report

LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion

Autor: Zhang, Jinyu, Gu, Yongchong, Gao, Jianxiong, Lin, Haitao, Sun, Qiang, Sun, Xinwei, Xue, Xiangyang, Fu, Yanwei

This paper addresses the challenge of perceiving complete object shapes through visual perception. While prior studies have demonstrated encouraging outcomes in segmenting the visible parts of objects within a scene, amodal segmentation, in particula

Externí odkaz: http://arxiv.org/abs/2408.03238

Zobrazit plný text záznamu

Report

CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization

Autor: Xiao, Min, Zhu, Junnan, Lin, Haitao, Zhou, Yu, Zong, Chengqing

Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive

Externí odkaz: http://arxiv.org/abs/2307.02716

Zobrazit plný text záznamu

Report

GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation

Autor: Huo, Jingyang, Sun, Qiang, Jiang, Boyan, Lin, Haitao, Fu, Yanwei

Most existing works solving Room-to-Room VLN problem only utilize RGB images and do not consider local context around candidate views, which lack sufficient visual cues about surrounding environment. Moreover, natural language contains complex semant

Externí odkaz: http://arxiv.org/abs/2305.17102

Zobrazit plný text záznamu

Report

MogaNet: Multi-order Gated Aggregation Network

Autor: Li, Siyuan, Wang, Zedong, Liu, Zicheng, Tan, Cheng, Lin, Haitao, Wu, Di, Chen, Zhiyuan, Zheng, Jiangbin, Li, Stan Z.

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on \textit{multi-order game-theoretic interaction} within deep neural networks (DNNs) reveals the repre

Externí odkaz: http://arxiv.org/abs/2211.03295

Zobrazit plný text záznamu

Report

Learning 6-DoF Object Poses to Grasp Category-level Objects by Language Instructions

Autor: Cheang, Chilam, Lin, Haitao, Fu, Yanwei, Xue, Xiangyang

This paper studies the task of any objects grasping from the known categories by free-form language instructions. This task demands the technique in computer vision, natural language processing, and robotics. We bring these disciplines together on th

Externí odkaz: http://arxiv.org/abs/2205.04028

Zobrazit plný text záznamu

Report

I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches

Autor: Lin, Haitao, Cheang, Chilam, Fu, Yanwei, Xue, Xiangyang

In this paper, we are interested in the problem of generating target grasps by understanding freehand sketches. The sketch is useful for the persons who cannot formulate language and the cases where a textual description is not available on the fly.

Externí odkaz: http://arxiv.org/abs/2205.04026

Zobrazit plný text záznamu

Report

SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

Autor: Lin, Haitao, Liu, Zichang, Cheang, Chilam, Fu, Yanwei, Guo, Guodong, Xue, Xiangyang

Given a single scene image, this paper proposes a method of Category-level 6D Object Pose and Size Estimation (COPSE) from the point cloud of the target object, without external real pose-annotated training data. Specifically, beyond the visual cues

Externí odkaz: http://arxiv.org/abs/2106.14193

Zobrazit plný text záznamu

Report

Neural Pose Transfer by Spatially Adaptive Instance Normalization

Autor: Wang, Jiashun, Wen, Chao, Fu, Yanwei, Lin, Haitao, Zou, Tianyun, Xue, Xiangyang, Zhang, Yinda

Pose transfer has been studied for decades, in which the pose of a source mesh is applied to a target mesh. Particularly in this paper, we are interested in transferring the pose of source human mesh to deform the target human mesh, while the source

Externí odkaz: http://arxiv.org/abs/2003.07254

Zobrazit plný text záznamu

SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

Autor: Lin, Haitao, Liu, Zichang, Cheang, Chilam, Fu, Yanwei, Guo, Guodong, Xue, Xiangyang

Publikováno v: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0e3df2cdcb43a214523fccc77e8203f2
https://doi.org/10.1109/cvpr52688.2022.00659

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání