Physically Grounded Monocular Depth via Nanophotonic Wavefront Prompting

ECCV 2026      

overview

Overview of our system and method. (a) Our birefringent metalens converts a 3D scene into two polarized images, encoding depth information in pixel-wise shifts between the images (see Fig. 3c). (b) The compact 3-mm-diameter metalens (right) consists of a two-dimensional array of 700-nm-tall TiO2 nanopillars with anisotropic cross-sections, engineered to provide independent phase control for x- and y-polarized light. For scale, it is shown alongside a 1-inch plano-convex lens (left) and a U.S. 1-cent coin (middle). (c) These depth-dependent optical signals are converted into model inputs and processed by a fine-tuned depth foundation model. (d) Our method recovers metrically accurate depth by combining physical depth cues with learned image priors, enabling high-quality physically grounded monocular depth estimation.

Abstract

Depth foundation models offer strong learned priors for 3D perception but lack physical depth cues, leading to ambiguities in metric scale. We introduce a birefringent metalens — a planar nanophotonic lens composed of subwavelength pixels for wavefront shaping with a thickness of 700 nm and a diameter of 3 mm — to physically prompt depth foundation models. In a single monocular shot, our metalens physically embeds depth information into two polarized optical wavefronts, which we decode through a lightweight prompting and fine-tuning framework that aligns depth foundation models with the optical signals. To scale the training data, we develop a light wave propagation simulator that synthesizes metalens responses from RGB-D datasets, incorporating key physical factors to minimize the sim-to-real gap. Simulated and physical experiments with our fabricated titanium-dioxide metalens demonstrate accurate and consistent metric depth over state-of-the-art monocular depth estimators. The research demonstrates that nanophotonic wavefront formation offers a promising bridge for grounding depth foundation models in physical depth sensing.

Links




paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper