3D segmentation is a core problem in computer vision and, similarly to many other dense prediction tasks, it requires large amounts of annotated data for adequate training. However, densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set. This area thus studies the effective use of unlabeled data to reduce the performance gap that arises due to the lack of annotations. In this work, inspired by Bayesian deep learning, we first propose a Bayesian self-training framework for semi-supervised 3D semantic segmentation. Employing stochastic inference, we generate an initial set of pseudo-labels and then filter these based on estimated point-wise uncertainty. By constructing a heuristic $n$-partite matching algorithm, we extend the method to semi-supervised 3D instance segmentation, and finally, with the same building blocks, to dense 3D visual grounding. We demonstrate state-of-the-art results for our semi-supervised method on SemanticKITTI and ScribbleKITTI for 3D semantic segmentation and on ScanNet and S3DIS for 3D instance segmentation. We further achieve substantial improvements in dense 3D visual grounding over supervised-only baselines on ScanRefer.
Our Bayesian pseudo-labeling pipeline is built for 3D semantic segmentation, 3D instance segmentation and dense 3D visual grounding. With only slight adjustments, using the same building blocks, our method can be adapted to each of these tasks to achieve SOTA results.
@InProceedings{unal2024bayesian,
author = {Unal, Ozan and Sakaridis, Christos and Van Gool, Luc},
title = {Bayesian Self-Training for Semi-Supervised 3D Segmentation},
booktitle = {European Conference on Computer Vision (ECCV)},
month = {October},
year = {2024}
}