We propose Co-op, a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image. Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning. While existing model-based methods suffer from inefficiency due to using a large number of templates, our method enables fast and accurate estimation with a small number of templates. This improvement is achieved by finding semidense correspondences between the input image and the pre-rendered templates. Our method achieves strong generalization performance by leveraging a hybrid representation that combines patch-level classification and offset regression. Additionally, our pose refinement model estimates probabilistic flow between the input image and the rendered image, refining the initial estimate to an accurate pose using a differentiable PnP layer. We demonstrate that our method not only estimates object poses rapidly but also outperforms existing methods by a large margin on the seven core datasets of the BOP Challenge, achieving state-of-theart accuracy.

阅读全文 »

We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a renderand-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, ToyotaLight, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d.

阅读全文 »

Rotation estimation of high precision from an RGB-D object observation is a huge challenge in 6D object pose estimation, due to the difficulty of learning in the non-linear space of SO(3). In this paper, we propose a novel rotation estimation network, termed as VI-Net, to make the task easier by decoupling the rotation as the combination of a viewpoint rotation and an in-plane rotation. More specifically, VI-Net bases the feature learning on the sphere with two individual branches for the estimates of two factorized rotations, where a V-Branch is employed to learn the viewpoint rotation via binary classification on the spherical signals, while another I-Branch is used to estimate the in-plane rotation by transforming the signals to view from the zenith direction. To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution (SPA-SConv), which settles the boundary problem of spherical signals via feature padding and realizes viewpoint-equivariant feature extraction by symmetric convolutional operations. We apply the proposed VI-Net to the challenging task of categorylevel 6D object pose estimation for predicting the poses of unknown objects without available CAD models; experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.

阅读全文 »

类别级6D对象姿态估计旨在估计特定类别中看不见的实例的旋转、平移和大小。在这一领域,基于密集对应的方法取得了领先的性能。但是,它们没有明确考虑不同实例的局部和全局几何信息,导致对具有显著形状变化的不可见实例的泛化能力较差。针对这个问题,我们提出了一种新的用于类别级6D物体姿态估计(AG-Pose)的实例自适应和几何感知关键点学习方法,该方法包括两个关键设计:(1)第一个设计是实例自适应关键点检测模块,它可以自适应地检测各种实例的一组稀疏关键点来表示它们的几何结构。(2)第二种设计是GeometricAware Feature Aggregation模块,可以高效地将局部和全局几何信息集成到关键点特征中。这两个模块可以协同工作,为看不见的实例建立健壮的关键点级对应关系,从而增强模型的泛化能力。在CAMERA25和REAL275数据集上的实验结果表明,所提出的AG-Pose在没有特定类别形状先验的情况下,其性能大大优于最先进的方法。代码将于https://github.com/Leeiieeo/AG-Pose发布。

阅读全文 »
0%