6D Pose Estimation of Objects Using Limited Training Data

Autor: Park, Kiru
Jazyk: angličtina
Rok vydání: 2020
Předmět:
DOI: 10.34726/hss.2020.85042
Popis: Pose estimation of objects is an important task to understand the surrounding environment for interacting with the objects in robot manipulation and augmented realityapplications. Major computer vision tasks, such as object detection and classification,have significantly improved using Convolutions Neural Networks (CNN). Likewise, recent pose estimation methods using CNN have achieved high performance using a largeamount of training data, which is, however, difficult to obtain from real environments.This thesis presents multiple methods that overcome the limited source of training inpractical scenarios while solving common challenges in object pose estimation.Symmetry and occlusion of objects are the most common challenges that makeestimations inaccurate. This thesis introduces a method that regresses pixel-wisecoordinates of an object while resolving ambiguous views from symmetric poses with anovel loss function in the training process. Coordinates of occluded regions are alsopredicted regardless of visibility, which makes the method robust to occlusion. Themethod shows state-of-the-art performance in the evaluations using only a limitednumber of real images. Nevertheless, annotating object poses in images is a difficultand time-consuming task, which prevents pose estimation methods from learning anew object from real scenes that are clutter. This thesis introduces an approach thatleverages a few cluttered images of an object to learn its appearances in arbitrary poses.The novel refinement step updates pose annotations of input images to reduce poseerrors that are common if poses are self-annotated by camera tracking or manuallyannotated by humans. Evaluations present the generated images from the method leadto state-of-the-art performance compared to methods using 13 times the number of realtraining images.Domains such as retail shops face new objects very often. Thus, it is inefficient totrain pose estimators for new objects every time. Furthermore, it is difficult to buildprecise 3D models of all instances in real-world environments. A template-based methodin this thesis tackles these practical challenges by estimating poses of a new objectusing previous observations of the same or similar objects. The nearest observations areused to determine the object’s locations, segmentation masks, and poses. The methodis further extended to predict dense correspondences between the nearest observationand a target object for transferring grasp poses from similar experiences. Evaluationsusing public datasets show the template-based method performs better than baselinemethods for segmentation and pose estimation tasks. Grasp experiments using a robotshow the benefit of leveraging successful grasp experiences that significantly improvethe grasp performance for familiar objects.
Databáze: OpenAIRE