paper

Physically Grounded Monocular Depth via Nanophotonic Wavefront Prompting

ECCV 2026

Bingxuan Li, New York University

Jiahao Wu, Columbia University

Yuan Xu, Columbia University

Zezheng Zhu, Columbia University

Yunxiang Zhang, New York University

Kenneth Chen, New York University

Yanqi Liang, Columbia University

Nanfang Yu, Columbia University

Qi Sun, New York University

Overview of our system and method. (a) Our birefringent metalens converts a 3D scene into two polarized images, encoding depth information in pixel-wise shifts between the images (see Fig. 3c). (b) The compact 3-mm-diameter metalens (right) consists of a two-dimensional array of 700-nm-tall TiO2 nanopillars with anisotropic cross-sections, engineered to provide independent phase control for x- and y-polarized light. For scale, it is shown alongside a 1-inch plano-convex lens (left) and a U.S. 1-cent coin (middle). (c) These depth-dependent optical signals are converted into model inputs and processed by a fine-tuned depth foundation model. (d) Our method recovers metrically accurate depth by combining physical depth cues with learned image priors, enabling high-quality physically grounded monocular depth estimation.

Abstract

Depth foundation models offer strong learned priors for 3D perception but lack physical depth cues, leading to ambiguities in metric scale. We introduce a birefringent metalens — a planar nanophotonic lens composed of subwavelength pixels for wavefront shaping with a thickness of 700 nm and a diameter of 3 mm — to physically prompt depth foundation models. In a single monocular shot, our metalens physically embeds depth information into two polarized optical wavefronts, which we decode through a lightweight prompting and fine-tuning framework that aligns depth foundation models with the optical signals. To scale the training data, we develop a light wave propagation simulator that synthesizes metalens responses from RGB-D datasets, incorporating key physical factors to minimize the sim-to-real gap. Simulated and physical experiments with our fabricated titanium-dioxide metalens demonstrate accurate and consistent metric depth over state-of-the-art monocular depth estimators. The research demonstrates that nanophotonic wavefront formation offers a promising bridge for grounding depth foundation models in physical depth sensing.

Links

paper

paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper paper