【论文笔记】CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

发表于 2025-04-06 分类于读万卷书阅读次数： Waline：本文字数： 400 阅读时长 ≈ 1 分钟

We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap. Observing that the shape decoder is well behaved in the convex hull of known shapes, we approximate the shape decoder with an active shape model, and show that this reduces the shape correction problem to a constrained linear least squares problem, which can be solved efficiently by an interior point algorithm. Third, we introduce a self-training pipeline to perform self-supervised domain adaptation of CRISP. The self-training is based on a correct-and-certify approach, which leverages the corrector to generate pseudo-labels at test time, and uses them to self-train CRISP. We demonstrate CRISP (and the selftraining) on YCBV, SPE3R, and NOCS datasets. CRISP shows high performance on all the datasets. Moreover, our self-training is capable of bridging a large domain gap. Finally, CRISP also shows an ability to generalize to unseen objects. Code and pre-trained models will be available on the project [webpage](https://web.mit.edu/sparklab/research/crisp_object_pose_shape/).

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

方法	类型	训练输入	推理输入	输出	pipeline
CRISP	任意级	RGBD	RGBD	\(\mathbf{R}, \mathbf{t}\) + 物体模型

2025.04.06：

Abstract

1. Introduction

Figure 1. We introduce CRISP, a category-agnostic object pose and shape estimation pipeline, and a test-time adaptive self-training method CRISP-ST to bridge domain gaps. Top: Qualitative examples of CRISP on the YCBV dataset [45]. Bottom: Qualitative examples of CRISP on the SPE3R dataset [27].

Towards addressing these issues, this paper presents three main contributions:

We introduce CRISP, an object pose and shape estimation pipeline. CRISP combines a pre-trained vision transformer (ViT) backbone with a dense prediction transformer (DPT) and feature-wise linear modulation (FiLM) conditiong to estimate the 6D pose and shape of the 3D object from a single RGB-D image [31, 33]. CRISP is categoryagnostic (i.e., it does not require knowledge of the object category at test time).
We propose an optimization-based pose and shape corrector that can correct estimation errors. The corrector is a bi-level optimization problem and we use block coordinate descent to solve it. We approximate the shape decoder in CRISP by an active shape model, and show that (i) this is a reasonable approximation, and (ii) doing so turns it into a constrained linear least squares problem, which can be solved efficiently using interior-point methods and yields just as good shapes as the trained decoder.
We adapt a correct-and-certify approach to self-train CRISP and bridge any large domain gap. During selftraining, we use the corrector to correct for pose and shape estimation errors. Then, we assert the quality of the output of the corrector using an observable correctness certificate inspired by [36], and create pseudo-labels using the estimates that pass the certificate check. Finally, we train the model on these pseudo-labels with standard stochastic gradient descent. Contrary to [22, 30], we do not need access to synthetic data during self-training.

【论文笔记】CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Abstract

1. Introduction

3. Object Pose and Shape Estimation

3.1. Shape Estimation

3.2. Pose Estimation

3.3. Supervised Training

4. Pose and Shape Correction

4.1. Pose and Shape Corrector

4.2. Active Shape Decoder

4.3. Shape Degeneracy Check

5. Self-Training for Test-Time Adaptation

6. Experiments

7. Limitations and Future Work

8. Conclusion

CRISP: Object Pose and Shape Estimation with Test-Time Adaptation

Abstract

1. Introduction

2. Related Work

3. Object Pose and Shape Estimation

3.1. Shape Estimation

3.2. Pose Estimation

3.3. Supervised Training

4. Pose and Shape Correction

4.1. Pose and Shape Corrector

4.2. Active Shape Decoder

4.3. Shape Degeneracy Check

5. Self-Training for Test-Time Adaptation

6. Experiments

7. Limitations and Future Work

8. Conclusion