Results
Does the proposed method surpass existing image-based approaches?
It can be found that
conventional keypoint-based (KP) and skeleton-based (SKL) methods yield poor performance
when attempting to localize keypoints and extract complete skeletons from high degrees of freedom (DoF) flexible robots.
Although regression-based (DR, SimPS) methods outperform KP and SKL,
they still fall short due to a lack of an effective mechanism to model the variation in flexible robot shapes.
In contrast, our method PoseEst. leverages the informative shape guidance to enhance the flexible robot shape representation,
and PoseRefine. further improves the representation by deforming the flexible robot shape with the initial pose parameters.
These strategies significantly improve the accuracy of pose estimation.
Table 1.
Quantitative comparison between our methods with the state-of-the-art methods.
We reported both average (Mean) and median (Med.) angular errors for each of the predicted pose parameters,
and the ratio of predictions whose prediction error is smaller than 5° (Acc5°) or 10°(Acc10°).
The initial pose of PoseRefine. comes from the results of PoseEst.
Figure 4.
Qualitative comparison between our methods with the state-of-the-art methods.
How effective is the shape guidance in enhancing the accuracy of pose estimation?
The leverage of the shape guidance can reduce the prediction error for most pose parameters.
In addition, the model with shape guidance consistently achieves higher accuracy with respect to different error thresholds.
Removing the robot configuration information from the shape guidance would consistently degrade the pose accuracy.
Figure 5. Ablation on shape guidance.
Does the shape prior guidance outperforms depth-based counterparts?
We recover the depth map from the image with a pre-trained depth prediction Transformer, lift the flexible robot to 3D,
and extract the geometry feature from the robot point cloud for pose estimation.
In both estimation and refinement stages, the depth-based counterpart is inferior to ours,
primarily attributed to the severe shape distortions caused by depth noise.
Figure 5. Comparison with depth-based counterparts.
Figure 6.
Illustration of the depth map and reconstructed point cloud.
Is the proposed methods can be applied to flexible robots with diverse configurations?
We made modifications on the robot arm by varying the arm thickness (Thick.), arm length (Len.), and the number of segments (Num.).
It demonstrates that our methods can smoothly adapt to robots with diverse configurations and surpass the most competitive baseline in the main result.
Table 2.
Quantitative evaluation on flexible robots with diverse configurations and results under different environmental changes.
Is the proposed method robust under various challenging surgical environments?
We conducted experiments under challenging visual conditions that typically present in surgery,
including too bright or dark lighting conditions (Lighting), visual occlusions caused by flushing water and bubbles (Occlusion),
and image blur caused by robot motion (Scope Rot.).
With the help of 3D shape guidance, our method keeps commendable performance in these challenging scenarios.
Figure 7.
Qualitative results under environmental changes.
Could the pose refinement method be generally effective?
We evaluated the model in two different scenarios.
First, we used it to refine the pose prediction from other baseline methods (Figure 8).
Second, we take the pose prediction from the previous frame as the initial robot pose for the current frame,
which is similar to robot pose tracking (Figure 9).
The results indicate that the pose refinement model can significantly improve the average accuracy
as well as the prediction robustness within the whole sequence.
Figure 8.
Effectiveness of pose refinement.
Figure 9.
Comparative results of flexible robot pose tracking.
Blue and red points represent ground truth and model predictions, respectively.
Could the uncertainty value reflect the pose estimation quality?
We adopt matrix Fisher distribution to construct a probabilistic model for representing
rotation matrices and improving pose estimation.
It can provide both the pose estimation and the reliance of the prediction.
The results suggest that data with greater uncertainty are more likely to have larger errors,
verifying that the uncertainty can be an effective indicator to reflect the pose quality.
Figure 10.
Indication ability of uncertainty with model performance.
(a) The x axis is the uncertainty value, and the y axis is the number of data points.
Different colors represent different error ranges, in degree.
(b) Qualitative results of the data with different error ranges and uncertainty values.