STRinGS: Selective Text Refinement in Gaussian Splatting

WACV 2026

CVIT, IIIT Hyderabad
Equal Contribution

STRinGS produces sharper and readable text as compared to vanilla 3DGS (shown at 7K iterations). The scatter plot presents readability (CER, lower is better) vs training time. STRinGS achieves the best performance both in terms of lowest error and fastest training time.

Abstract

Text as signs, labels, or instructions is a critical element of real-world scenes as they can convey important contextual information. 3D representations such as 3D Gaussian Splatting (3DGS) struggle to preserve fine-grained text details, while achieving high visual fidelity. Small errors in textual element reconstruction can lead to significant semantic loss.

We propose STRinGS, a text-aware, selective refinement framework to address this issue for 3DGS reconstruction. Our method treats text and non-text regions separately, refining text regions first and merging them with non-text regions later for full-scene optimization. STRinGS produces sharp, readable text even in challenging configurations. We introduce a text readability measure OCR Character Error Rate (CER) to evaluate the efficacy on text regions. STRinGS results in a 63.6% relative improvement over 3DGS at just 7K iterations. We also introduce a curated dataset STRinGS-360 with diverse text scenarios to evaluate text readability in 3D reconstruction. Our method and dataset together push the boundaries of 3D scene understanding in text-rich environments, paving the way for more robust text-aware reconstruction methods.

STRinGS Pipeline

Given input images, we use COLMAP to obtain a point cloud and undistorted images, which are passed to Hi-SAM to obtain text masks. The point cloud and masks are passed to the Text Segmentation in 3D module to obtain partitioned text and non-text point clouds. These are processed through a two-phase pipeline. In phase 1, we perform targeted densification and reconstruction of text Gaussians. In phase 2, we perform full scene refinement, where text and non-text Gaussians are optimized with distinct learning strategies, enabling targeted enhancement of text without compromising scene quality. The final output is a text-refined Gaussian Splat representation with enhanced text readability while preserving overall scene fidelity.

STRinGS Methodology

STRinGS-360 Dataset

To address the lack of text-rich 3D scene datasets, we introduce STRinGS-360, a curated dataset of five indoor scenes featuring dense, semantically meaningful text. The scenes include instructional text on a curved fire extinguisher, tightly packed book titles, labeled chemical bottles, geographical names on a globe, and academic books with occlusions and repeated titles. Together, they span flat, cylindrical, and spherical configurations, providing a realistic benchmark for evaluating fine-grained text reconstruction in 3D Gaussian Splatting.

STRinGS-360 Dataset

Visual Comparisons

Results at 7K iterations

STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS
STRinGS (Ours)
3DGS

Results

OCR-CER Results

OCR-based Character Error Rate (CER) on rendered images at 7K and 30K training iterations averaged over all scenes in the dataset. Lower CER indicates better text readability.

Training Time Results

Training time in minutes at 7K and 30K training iterations, averaged over all scenes in the dataset.

Acknowledgements

We thank Harshavardhan P. for feedback and guidance throughout this project. Ravi Kiran Sarvadevabhatla thanks Digital India Bhashini Division, Ministry of Electronics and Information Technology (MeiTY), Government of India for supporting the project. Makarand Tapaswi thanks Adobe Research for travel support.

BibTeX

@InProceedings{STRinGS_2026_WACV,
  author    = {Raundhal, Abhinav and Behera, Gaurav and Narayanan, P. J. and Sarvadevabhatla, Ravi Kiran and Tapaswi, Makarand},
  title     = {STRinGS: Selective Text Refinement in Gaussian Splatting},
  booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  month     = {March},
  year      = {2026},
}