Shape-guided Configuration-aware Learning
for Endoscopic-image-based Pose Estimation
of Flexible Robotic Instruments

ECCV 2024

Yiyao Ma 1* , Kai Chen 1* , Hon-Sing Tong 2 , Ruofeng Wei 1 ,
Yui-Lun Ng 2 , Ka-Wai Kwok 1,2,3† , and Qi Dou 1†
1The Chinese University of Hong Kong, 2Agilis Robotics Limited, 3The University of Hong Kong
( *Equal Contribution, Corresponding Authors)
Figure 1. Illustration of a flexible robot with four pose parameters.

Abstract

Accurate estimation of both the external orientation and internal bending angle is crucial for understanding a flexible robot state within its environment. However, existing sensor-based methods face limitations in cost, environmental constraints, and integration issues. Conventional image-based methods struggle with the shape complexity of flexible robots. In this paper, we propose a novel shape-guided configuration-aware learning framework for image-based flexible robot pose estimation. Inspired by the recent advances in 2D-3D joint representation learning, we leverage the 3D shape prior of the flexible robot to enhance its image-based shape representation. Concretely, we first extract the part-level geometry representation of the 3D shape prior, then adapt this representation to the image by querying the image features corresponding to different robot parts. Furthermore, we present an effective mechanism to dynamically deform the shape prior. It aims to mitigate the shape difference between the adopted shape prior and the flexible robot depicted in the image. This more expressive shape guidance further boosts the image-based robot representation and can be effectively used for flexible robot pose refinement. Extensive experiments on surgical flexible robots demonstrate the advantages of our method when compared with a series of keypoint-based, skeleton-based and direct regression-based methods.

Flexible Robot Pose Estimation

Pose Estimation with Configuration-aware Shape Guidance

img description
Figure 2. Illustration of the shape-guided configuration-aware learning method.

Based on the flexible robot shape prior and its part labels derived from its configuration information, we extract a part-level flexible robot shape representation. This representation is then utilized to enhance the image-based flexible robot representation for improving the accuracy of image-based flexible robot pose estimation. To parameterize the flexible robot pose, we employ a probabilistic model that simultaneously predicts both the pose value and the pose uncertainty.

Pose Refinement with Configuration-aware Shape Deformation

img description
Figure 3. Illustration of the pose refinement scheme.

Based on the initial flexible robot pose, we deform the robot shape prior via skeleton curve modeling and cylinder instantiation.

Results

Does the proposed method surpass existing image-based approaches?
It can be found that conventional keypoint-based (KP) and skeleton-based (SKL) methods yield poor performance when attempting to localize keypoints and extract complete skeletons from high degrees of freedom (DoF) flexible robots. Although regression-based (DR, SimPS) methods outperform KP and SKL, they still fall short due to a lack of an effective mechanism to model the variation in flexible robot shapes. In contrast, our method PoseEst. leverages the informative shape guidance to enhance the flexible robot shape representation, and PoseRefine. further improves the representation by deforming the flexible robot shape with the initial pose parameters. These strategies significantly improve the accuracy of pose estimation.
Interpolation end reference image.
Table 1. Quantitative comparison between our methods with the state-of-the-art methods. We reported both average (Mean) and median (Med.) angular errors for each of the predicted pose parameters, and the ratio of predictions whose prediction error is smaller than 5° (Acc5°) or 10°(Acc10°). The initial pose of PoseRefine. comes from the results of PoseEst.
Interpolation end reference image.
Figure 4. Qualitative comparison between our methods with the state-of-the-art methods.

How effective is the shape guidance in enhancing the accuracy of pose estimation?
The leverage of the shape guidance can reduce the prediction error for most pose parameters. In addition, the model with shape guidance consistently achieves higher accuracy with respect to different error thresholds. Removing the robot configuration information from the shape guidance would consistently degrade the pose accuracy.
Interpolation end reference image.
Figure 5. Ablation on shape guidance.

Does the shape prior guidance outperforms depth-based counterparts?
We recover the depth map from the image with a pre-trained depth prediction Transformer, lift the flexible robot to 3D, and extract the geometry feature from the robot point cloud for pose estimation. In both estimation and refinement stages, the depth-based counterpart is inferior to ours, primarily attributed to the severe shape distortions caused by depth noise.
Interpolation end reference image.
Figure 5. Comparison with depth-based counterparts.
Interpolation end reference image.
Figure 6. Illustration of the depth map and reconstructed point cloud.

Is the proposed methods can be applied to flexible robots with diverse configurations?
We made modifications on the robot arm by varying the arm thickness (Thick.), arm length (Len.), and the number of segments (Num.). It demonstrates that our methods can smoothly adapt to robots with diverse configurations and surpass the most competitive baseline in the main result.
Interpolation end reference image.
Table 2. Quantitative evaluation on flexible robots with diverse configurations and results under different environmental changes.

Is the proposed method robust under various challenging surgical environments?
We conducted experiments under challenging visual conditions that typically present in surgery, including too bright or dark lighting conditions (Lighting), visual occlusions caused by flushing water and bubbles (Occlusion), and image blur caused by robot motion (Scope Rot.). With the help of 3D shape guidance, our method keeps commendable performance in these challenging scenarios.
Interpolation end reference image.
Figure 7. Qualitative results under environmental changes.

Could the pose refinement method be generally effective?
We evaluated the model in two different scenarios. First, we used it to refine the pose prediction from other baseline methods (Figure 8). Second, we take the pose prediction from the previous frame as the initial robot pose for the current frame, which is similar to robot pose tracking (Figure 9). The results indicate that the pose refinement model can significantly improve the average accuracy as well as the prediction robustness within the whole sequence.
Interpolation end reference image.
Figure 8. Effectiveness of pose refinement.
Interpolation end reference image.
Figure 9. Comparative results of flexible robot pose tracking. Blue and red points represent ground truth and model predictions, respectively.

Could the uncertainty value reflect the pose estimation quality?
We adopt matrix Fisher distribution to construct a probabilistic model for representing rotation matrices and improving pose estimation. It can provide both the pose estimation and the reliance of the prediction. The results suggest that data with greater uncertainty are more likely to have larger errors, verifying that the uncertainty can be an effective indicator to reflect the pose quality.
Interpolation end reference image.
Figure 10. Indication ability of uncertainty with model performance. (a) The x axis is the uncertainty value, and the y axis is the number of data points. Different colors represent different error ranges, in degree. (b) Qualitative results of the data with different error ranges and uncertainty values.