OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting

for Object-Level Scene Understanding

ICRA2025


Dianyi Yang1, Yu Gao1, Xihan Wang1, Yufeng Yue1, Yi Yang1†, Mengyin Fu1

1Beijing Institute of Technology    Corresponding Author   

Abstract


Recent advancements in 3D Gaussian Splatting have significantly improved the efficiency and quality of dense semantic SLAM. However, previous methods are generally constrained by limited-category pre-trained classifiers and implicit semantic representation, which hinder their performance in open-set scenarios and restrict 3D object-level scene understanding. To address these issues, we propose OpenGS-SLAM, an innovative framework that utilizes 3D Gaussian representation to perform dense semantic SLAM in open-set environments. Our system integrates explicit semantic labels derived from 2D foundational models into the 3D Gaussian framework, facilitating robust 3D object-level scene understanding. We introduce Gaussian Voting Splatting to enable fast 2D label map rendering and scene updating. Additionally, we propose a Confidence-based 2D Label Consensus method to ensure consistent labeling across multiple views. Furthermore, we employ a Segmentation Counter Pruning strategy to improve the accuracy of semantic scene representation. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our method in scene understanding, tracking, and mapping, achieving 10× faster semantic rendering and 2× lower storage costs compared to existing methods.


Overview


An overview of OpenGS-SLAM. Our method takes an RGB-D stream as input. RGB images are first fed into our Semantic Information Generator to obtain the input label map, along with the confidence scores and corresponding class for each label. Concurrently, G-ICP is used to estimate the camera pose and extract source Gaussian data. Using the current pose, we render an RGB-D image, label map, and contribution record matrix via Gaussian Voting Splatting. Confidence-based 2D Label Consensus is then applied to unify the input label map with the current map, ensuring semantic consistency. During this process, partial Gaussian data is updated, and counter Gaussians are pruned. The consistent input label map and source Gaussians are combined to generate new Gaussian data, achieving scene densification.


Video