OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding

School of Automation, Beijing Institute of Technology, Beijing, China
National Key Lab of Autonomous Intelligent Unmanned Systems

*Indicates Corresponding Author

Abstract

Recent advancements in 3D scene understanding have made significant strides in enabling interaction with scenes using open-vocabulary queries, particularly for VR/AR and robotic applications. Nevertheless, existing methods are hindered by rigid offline pipelines and the inability to provide precise 3D object-level understanding given open-end queries. In this paper, we present OpenGS-Fusion, an innovative open-vocabulary dense mapping framework that improves semantic modeling and refines object-level understanding. OpenGS-Fusion combines 3D Gaussian representations with a Truncated Signed Distance Field to facilitate lossless fusion of semantic features on-the-fly. Furthermore, we introduce a novel multimodal language-guided approach named MLLM-Assisted Adaptive Thresholding, which refines the segmentation of 3D objects by adaptively adjusting similarity thresholds, achieving an improvement 17\% in 3D mIoU compared to the fixed threshold strategy. Extensive experiments demonstrate that our method outperforms existing methods in 3D object understanding and scene reconstruction quality, as well as showcasing its effectiveness in language-guided scene interaction.

A beautiful landscape

Overview of OpenGS-Fusion. Receiving RGB-D input with 2D language embeddings extracted from 2D foundation models, we simultaneously update the appearance, geometry and semantic features of our hybrid 3D Gaussian scene representation. Additionally, the proposed open-vocabulary query strategy enables precise localization of 3D objects without the need for explicit scene segmentation.

Supplementary

Used Prompt
If you are a Visual Retrieval Expert, I am providing you a series of images related to a specific object query: '{Object Name/Description}.' Please identify and highlight the image number where the queried object is most clearly presented, with minimal background noise or irrelevant elements. The ideal image should feature only the queried object, with a clear and unobstructed view. Please return the index of the most relevant image.

BibTeX

BibTex Code Here