The increasing availability of digital 3D environments, whether through image-based 3D reconstruction, generation, or scans obtained by robots, is driving innovation across various applications. These come with a significant demand for 3D interaction, such as 3D Interactive Segmentation, which is useful for tasks like object selection and manipulation.
Additionally, there is a persistent need for solutions that are efficient, precise, and performing well across diverse settings, particularly in unseen environments and with unfamiliar objects. In this work, we introduce a 3D interactive segmentation method that consistently surpasses previous state-of-the-art techniques on both in-domain and out-of-domain datasets.
Our simple approach integrates a voxel-based sparse encoder with a lightweight transformer-based decoder that implements implicit click fusion, achieving superior performance and maximizing efficiency. Our method demonstrates substantial improvements on benchmark datasets, including ScanNet, ScanNet++, S3DIS, and KITTI-360, and also on unseen geometric distributions such as the ones obtained by Gaussian Splatting.
This video showcases a VR application featuring Easy3D, where a user can define 3D clicks to segment objects (blue masks). The objects can then be moved, re-arranged, exploded as well as dunefied.
Below we visualize the predictions of our method compared to previous approaches for up to 3 user clicks, showing the IoU between the the ground-truth object mask (green) and the predicted mask (red). Easy3D demonstrates superior generalization performance, especially on challenging domains like KITTI-360.
@article{simonelli2025easy3d,
title = {Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation},
author = {Simonelli, Andrea and M{\"u}ller, Norman and Kontschieder, Peter},
journal = {arXiv preprint arXiv:2504.11024},
year = {2025},
}