【论文笔记】Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

Abstract

1. Introduction

Figure 1. The problem of optical flow in 6D pose estimation. Given an initial pose, one can estimate the dense 2D-to-2D correspondence (optical flow) between the input and the synthetic image rendered from the initial pose, and then lift the dense 2D matching to 3D-to-2D correspondence to obtain a new refined pose by PnP solvers (PFA-Pose [16]). However, the flow estimation does not take the target’s 3D shape into account, as illustrated by the warped image based on the estimated flow in the last figure, which introduces significant matching noise to pose solvers and is suboptimal for 6D object pose estimation.
Figure 2. Different pose refinement paradigms. (a) Most pose refinement methods [16] rely on a recurrent architecture to estimate dense 2D flow between the rendered image I_1 and the real input image I_2, based on a dynamically-constructed correlation map according to the flow results of the previous iteration. After the convergence of the flow network and lifting the 2D flow to a 3D-to-2D correspondence field, they use PnP solvers to compute a new refined pose \hat{P}. This strategy, however, has a large matching space for every pixel in constructing correlation maps, and optimizes a surrogate matching loss that does not directly reflect the final 6D pose estimation task. (b) By contrast, we propose optimizing the pose and flow simultaneously in an end-to-end recurrent framework with the guidance of the target’s 3D shape. We impose a shape constraint on the correlation map construction by forcing the construction to comply with the target’s 3D shape, which reduces the matching space significantly. Furthermore, we propose learning the object pose based on the current flow prediction, which, in turn, helps the flow prediction and yields an end-to-end system for object pose.

3. Approach

Figure 3. Overview of our shape-constraint recurrent framework. After building a 4D correlation volume between the rendered image and the input target image, we use GRU [7] to predict an intermediate flow, based on the predicted flow F_{k-1} and the hidden state h_{k - 1} of GRU from the previous iteration. We then use a pose regressor to predict the relative pose \Delta P_k based on the intermediate flow, which is used to update the previous pose estimation P_{k - 1}. Finally, we compute a pose-induced flow based on the displacement of 2D reprojection between the initial pose and the currently estimated pose P_k. We use this pose-induced flow to index the correlation map for the following iterations, which reduces the matching space significantly. Here we show the flow and its corresponding warp results in the dashed boxes. Note how the intermediate flow does not preserve the shape of the target, but the pose-induced flow does.

给定

3.1. Overview

3.2. Shape-Constraint Correlation Space

3.3. Learning Object Pose From Optical Flow

3.4. Implementation Details

4. Experiments

5. Conclusion