Introduction

Metasurface, an ultra-thin and highly integrated optical device, is capable of full light control and thus provides novel applications in quantum photonics1. In photonic quantum information processing, multiphoton entanglement is the building block for wide range of tasks, such as quantum computation2, quantum error correction3, quantum secret sharing4,5, and quantum sensing6. Recent investigations highlighted the feasibility of metasurface in generation7,8, manipulation9,10,11, and detection12,13 of multiphoton entanglement, indicating metasurface as a promising technology of ultra-thin optical device for large-scale quantum information processing.

Characterization of multiphoton entanglement provides diagnostic information on experimental imperfections and benchmarks our technological progress towards the reliable control of large-scale photons. The standard quantum tomography (SQT)14 requires an exponential overhead with respect to the system size. Recently, more efficient protocols have been proposed and demonstrated with fewer measurements, such as compressed sensing15,16, adaptive tomography17,18,19 and self-guided quantum tomography (SGQT)20,21,22. Shadow tomography, which was first proposed by Aaronson et al.23 and then concreted by Huang et al.24, efficiently predicts functions of the quantum states instead of state reconstruction. Huang’s protocol24 is hereafter referred as shadow tomography. Shadow tomography is efficient in estimation of quantities in terms of observable (polynomial), including nonlinear observables such as purity and R\(\acute{{{{{{{{\rm{e}}}}}}}}}\)nyi entropy25,26,27,28, which is of particular interest in detecting multipartite entanglement29,30,31,32 and thus is helpful in benchmarking the technologies towards generation of genuine multipartite entanglement33,34,35. Nevertheless, shadow tomography generally requires the experimental capability of performing information-complete measurements, leading to the consequence that the time of switching experimental setting is much longer than that of data acquisition. A potential solution is to replace the unitary operations and projective measurements with positive operator valued measures (POVMs), which is capable to extract complete information in a single experimental setting36,37,38. The POVM significantly alleviates the experimental complexity to perform shadow tomography, and thus enables the real-time shadow tomography, i.e., an experimentalist is free to stop shadow tomography at any time. However, a compact and scalable implementation of POVM in optical system is still technically challenging. On the other hand, shadow tomography is not able to easily predict the properties that cannot be directly expressed in terms of observables (polynomial) such as von Neumann entropy \(S(\rho )=-{{{{{{{\rm{T}}}}}}}}r(\rho \log \rho )\), which is key ingredient in topological entanglement entropy39,40.

In this work, we report an implementation of POVM enabled by a metasurface, which is based on planar arrays of nanopillars and able to provide complete control of polarization. The POVM we achieved allows to implement real-time shadow tomography, and observe the shadow norm that determines sample complexity. Moreover, we show that the metasurface-enabled shadow tomography can be readily equipped with other algorithms, enabling the unexplored advantages of shadow tomography. In particular, we propose and implement shadow tomography optimized by simultaneous perturbation stochastic approximation (SPSA)41, the so-called self-learning shadow tomography (SLST). SLST efficiently returns a physical state with high accuracy against the metasurface-induced imperfections, which can be further used to calculate the quantities that cannot be expressed in terms of directly observable. We also implement robust shadow tomography42 to show the robustness of reconstruction against the engineered optical loss.

Results

Shadow tomography with POVM

We start by briefly reviewing the shadow tomography with POVM. Considering a 2-level (qubit) quantum system, a set of L rank-one projectors \({\{\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert \in {{\mathbb{H}}}_{2}\}}_{l=1}^{L}\) is called a quantum 2-design if the average value of the second-moment operator \({(\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert )}^{\otimes 2}\) over the set is proportional to the projector onto the totally symmetric subspace of two copies43. Each quantum 2-design is proportional to a POVM \({{{{{{{\bf{E}}}}}}}}={\{\frac{2}{L}\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert \}}_{l=1}^{L}\) with the element \({E}_{l}=\frac{2}{L}\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert\) being positive semidefinite and satisfying \({\sum}_{l=1}^{L}{E}_{l}={{\mathbb{1}}}_{2}\). Note that quantum 1-design is sufficient to form a POVM but is not always information-complete for tomography, such as the measurement on computational basis \(\{\left\vert 0\right\rangle,\left\vert 1\right\rangle \}\). Measuring a quantum state ρ using POVM E results one l [L] outcome with probability Pr(lρ) = Tr(Elρ) according to Born’s rule. The POVM E together with the preparation of the corresponding state \(\left\vert {\psi }_{l}\right\rangle\) can be viewed as a linear map \({{{{{{{\mathcal{M}}}}}}}}:{{\mathbb{H}}}_{2}\to {{\mathbb{H}}}_{2}\), and the ‘classical shadow’ is the solution of least-square estimator with single experimental run,

$${\hat{\rho }}_{l}^{(m)}={{{{{{{{\mathcal{M}}}}}}}}}^{-1}(\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert )=3\left\vert {\psi }_{l}\right\rangle \left\langle {\psi }_{l}\right\vert -{{\mathbb{1}}}_{2}.$$
(1)

For an N-qubit state, the classical shadow is the tensor product of simultaneous single-qubit estimations \({\hat{\rho }}^{(m)}{=\bigotimes }_{n=1}^{N}{\hat{\rho }}_{{l}_{n}}^{(m)}\) with ln being the outcome of n-th qubit, and one has \({\mathbb{E}}[{\hat{\rho }}^{(m)}]=\rho\). Repeating the POVM M times (experimental runs), one has a collection of classical shadows \({\{{\hat{\rho }}^{(m)}\}}_{m=1}^{M}\), which is further inquired for estimation of various properties of the underlying state. See Supplementary Note 1A for more details.

Implementation of POVM with metasurface

In our experiment, we focus on the POVM on polarization-encoded qubit, i.e., \(\left\vert 0(1)\right\rangle=\left\vert H(V)\right\rangle\) with \(\left\vert H(V)\right\rangle\) being the horizontal (vertical) polarization, and consider POVM of L = 6 and \(\left\vert {\psi }_{l}\right\rangle \in \{\left\vert H\right\rangle,\left\vert V\right\rangle,\left\vert+ \right\rangle, \left\vert -\right\rangle,\left\vert R\right\rangle,\left\vert L\right\rangle \}\) with \(\left\vert \pm \right\rangle=(\left\vert H\right\rangle \pm \left\vert V\right\rangle )/\sqrt{2}\) and \(\left\vert R(L)\right\rangle=(\left\vert H\right\rangle \pm i\left\vert V\right\rangle )/\sqrt{2}\). The corresponding POVM Eocta is described by a symmetric polytope of an octahedron on Bloch sphere as shown in Fig. 1a. To realize Eocta, we design and fabricate a 210 μm × 210 μm polarization-dependent metasurface that splits incident light into six directions corresponding to projection on \(\left\vert H\right\rangle\), \(\left\vert V\right\rangle\), \(\left\vert+\right\rangle\), \(\left\vert -\right\rangle\), \(\left\vert R\right\rangle\) and \(\left\vert L\right\rangle\) with equal probability (shown in Fig. 1b). Note that projection on \(\left\vert {\psi }_{l}\right\rangle\) with equal probability is guaranteed with post-selection to eliminate the mode mismatch between incident light (Gaussian beam) and metasurface (square) (see Supplementary Note 5 for details). The metasurface is an array (with square pixel of s = 500 nm) of single-layer amorphous silicon nanopillars on quartz substrate as shown in Fig. 1c, d. The nanopillars are with the same height of 700 nm but different \({l}_{{x}^{{\prime} }}\), \({l}_{{y}^{{\prime} }}\) and orientation θ relative to the reference coordinate system. In this sense, a single nanopillar can be regarded as a waveguide with different rectangular cross profile that exhibits corresponding effective birefringence, leading to spatial separation between orthogonal polarizations44. The metasurface is divided into three regions with same size of 210 μm × 70 μm but different arrangement of nanopillars, i.e., \((\theta,{l}_{{x}^{{\prime} }},{l}_{{y}^{{\prime} }})\). By carefully designing the arrangement of nanopillars, we can realize spatial separation of \(\left\vert H\right\rangle /\left\vert V\right\rangle\), \(\left\vert+\right\rangle /\left\vert -\right\rangle\) and \(\left\vert R\right\rangle /\left\vert L\right\rangle\), respectively. To validate the capability of fabricated metasurface to perform information-complete measurement, we test metasurface with input states of \(\left\vert {\psi }_{l}\right\rangle\) and measure the distribution of output intensity on focal plane. The results are shown in Fig. 1e, according to which we reconstruct the Stokes parameters (s1, s2, s3) shown in Fig. 1f. Compared to the ideal values, the average errors of reconstructed Stokes parameters are 0.101 ± 0.005, 0.086 ± 0.005 and 0.073 ± 0.005, respectively. These errors are mainly caused by the discretization of phase front in design, which inevitably introduces higher-order deflections45 (see Supplementary Note 4 for details of metasurface).

Fig. 1: The metasurface-enabled octahedron positive operator valued measure (POVM) Eocta.
figure 1

a The elements in Eocta are projectors on states \(\left\vert H\right\rangle\), \(\left\vert V\right\rangle\), \(\left\vert+\right\rangle\), \(\left\vert -\right\rangle\), \(\left\vert R\right\rangle\) and \(\left\vert L\right\rangle\) respectively, which form a symmetric polytope of an octahedron on Bloch sphere. b The metasurface to realize Eocta, green, yellow and red blocks on the metasurface represent nanopillars with different arrangements. c The scanning electron microscopy images of the fabricated nanopillars in three regions. d Schematic drawing of single nanopillar that is fabricated with same height of 700 nm but different \((\theta,{l}_{{x}^{{\prime} }},{l}_{y{\prime} })\). e The measured distribution of intensity on focal plane with input polarization of \(\left\vert H\right\rangle\), \(\left\vert V\right\rangle\), \(\left\vert+\right\rangle\), \(\left\vert -\right\rangle\), \(\left\vert R\right\rangle\) and \(\left\vert L\right\rangle\), respectively. f The reconstructed Stokes parameters (s1, s2, s3) from data collected in (e) and the error bars indicate standard deviations of reconstructed Stokes parameters.

Estimation of observables

We first perform shadow tomography with the fabricated metasurface on single-photon pure state \(\left\vert {\psi }_{\gamma,\phi }\right\rangle=\cos \gamma \left\vert H\right\rangle+\sin \gamma {{{{{{{{\rm{e}}}}}}}}}^{i\phi }\left\vert V\right\rangle\) with γ = 0.91 and ϕ = 0.12. As shown in Fig. 2a, the polarization-entangled photons (central wavelength of 810 nm) are generated from a periodically poled potassium titanyl phosphate (PPKTP) crystal placed in a Sagnac interferometer via spontaneous parametric down conversion (SPDC), which is pumped by a laser diode (central wavelength of 405 nm). The generated entangled photons are with ideal form of \({\left\vert \psi \right\rangle }_{\eta }=\sqrt{\eta }\left\vert HV\right\rangle+\sqrt{1-\eta }\left\vert VH\right\rangle\), where η is determined by polarization of pump light. Projecting one photon of \({\left\vert \psi \right\rangle }_{\eta }\) on \(\left\vert H\right\rangle\) heralds the other photon on state \(\left\vert V\right\rangle\), which can further be transformed to arbitrary \(\big \vert {\psi }_{\gamma,\phi }\big \rangle=\cos \gamma \left\vert H\right\rangle+\sin \gamma {{{{{{{{\rm{e}}}}}}}}}^{i\phi }\left\vert V\right\rangle\) by a combination of electrically-rotated half-wave plate (E-HWP) and quarter-wave plate (E-QWP). Then, the heralded photon passes through the metasurface, and is coupled to six multimode fibers at outputs using an objective lens (OL), a tube lens, and three prisms, respectively. With the collection of classical shadows \({\{{\hat{\rho }}^{(m)}\}}_{m=1}^{M}\), we focus on the estimation of observables in set of 128 single-qubit projections, i.e., \(O=\left\vert {\psi }_{\kappa,\nu }\right\rangle \left\langle {\psi }_{\kappa,\nu }\right\vert \in {{{{{{{\bf{O}}}}}}}}\) with \(\left\vert {\psi }_{\kappa,\nu }\right\rangle=\cos \kappa \left\vert H\right\rangle+\sin \kappa {{{{{{{{\rm{e}}}}}}}}}^{i\nu }\left\vert V\right\rangle\) being uniformly distributed on Bloch sphere. The estimation of expected value of observable is \(\hat{O}=1/M{\sum }_{m=1}^{M}{\hat{o}}^{(m)}\), where \({\hat{o}}^{(m)}={{{{{{{\rm{T}}}}}}}}r(O{\hat{\rho }}^{(m)})\) is the i.i.d single-shot estimator. Note that \(\hat{O}\) converges to the exact expectation value Tr(ρO) as M → . The error of estimation with metasurface-enabled POVM is indicated by the distance between \(\hat{O}\) and ideal expectation \(\langle O\rangle=\big \langle {\psi }_{\gamma,\phi }\big\vert O\big\vert {\psi }_{\gamma,\phi }\big \rangle\). As shown in Fig. 2b, the maximal distance \({\max }_{{{{{{{{\bf{O}}}}}}}}}\parallel \hat{O}-\langle O\rangle \parallel\) converges to 0.07 with the increase of M, which is consistent with the error we obtained in reconstruction of Stokes parameters. In Fig. 2c, we show the real-time estimation of \(\hat{O}\) by randomly selecting five OO, in which we observe the convergence of \(\hat{O}\) after a few hundreds of milliseconds.

Fig. 2: The experimental setup and results of shadow tomography with metasurface-enabled positive operator valued measure (POVM).
figure 2

a Setup to generate entangled photons and demonstrate shadow tomography with metasurface. PBS: polarizing beam splitter. DM: dichroic mirror. HWP: half-wave plate. QWP: quarter-wave plate. E-HWP: electrically-rotated HWP. E-QWP: electrically-rotated QWP. OL: objective lens. b The maximal error in estimation of expectation of OO. c The real-time estimation of expectation of five randomly selected OO. d The results of shadow norm \({\max }_{{{{{{{{\bf{P}}}}}}}}}{{{{{{{\rm{Var}}}}}}}}(\hat{o})\) for \(O=\left\vert+\right\rangle \left\langle+\right\vert\) with different experimental runs. e The results of shadow norm for 128 OO (red dots), and the simulated results of shadow norm with symmetric informationally complete (SIC) POVM (blue diamonds). The 128 observables OO are selected according to Haar random. The dots and bars in (b) and (d) are the mean value and corresponding standard deviations obtained by repeating the experiment 5 times. The abbreviations of Exp. and Sim. indicate experimental results and simulation results respectively.

The sample complexity of estimation is further characterized by the variance \({{{{{{{\rm{Var}}}}}}}}(\hat{O})={{{{{{{\rm{Var}}}}}}}}(\hat{o})\le | | O| {| }_{{{{{{{{\rm{shd}}}}}}}}}^{2}\). Here the shadow norm \(| | O| {| }_{{{{{{{{\rm{shd}}}}}}}}}^{2}\)24 is the maximization of \({{{{{{{\rm{Var}}}}}}}}(\hat{o})\) over all possible states ρ to remove the state-dependence. For ideal Eocta, the shadow norm \(| | O| {| }_{{{{{{{{\rm{shd}}}}}}}}}^{2}=0.75\) regardless of the explicit form of OO (see Supplementary Note 1D for deviation of \(| | O| {| }_{{{{{{{{\rm{shd}}}}}}}}}^{2}=0.75\)). Experimentally, the variance of a single-shot estimation is

$${{{{{{{\rm{Var}}}}}}}}(\hat{o})=\frac{1}{M} {\sum}_{m=1}^{M}{\left({\hat{o}}^{(m)}-\hat{O}\right)}^{2}.$$
(2)

It is impossible to maximize \({{{{{{{\rm{Var}}}}}}}}(\hat{o})\) over all possible \(\big \vert {\psi }_{\gamma,\phi }\big \rangle\) in experiment, so that we prepare totally 20 \(\big \vert {\psi }_{\gamma,\phi }\big \rangle\) that are uniformly distributed on Bloch sphere, forming a state set of P. For each prepared \(\left\vert {\psi }_{\gamma,\phi }\right\rangle\), we perform shadow tomography and estimate the expectation of \(O=\left\vert+\right\rangle \left\langle+\right\vert\). The results of \({\max }_{{{{{{{{\bf{P}}}}}}}}}{{{{{{{\rm{Var}}}}}}}}(\hat{o})\) are shown in Fig. 2d, in which we observe that \({\max }_{{{{{{{{\bf{P}}}}}}}}}{{{{{{{\rm{Var}}}}}}}}({\hat{o}}^{(m)})\) converges to 0.75 when M > 255. In Fig. 2e, we show \({\max }_{{{{{{{{\bf{P}}}}}}}}}{{{{{{{\rm{Var}}}}}}}}(\hat{o})\) of 128 observables OO with M = 315 measurements, which agrees well with the theoretical prediction that the shadow norm is a constant regardless of the explicit form of OO38. To give a comparison, we simulate \({\max }_{{{{{{{{\bf{P}}}}}}}}}{{{{{{{\rm{Var}}}}}}}}(\hat{o})\) with symmetric informationally complete (SIC) POVM ESIC46, which is constructed with the minimal number of 4 measurements for qubit system and has been widely adopted in investigations of advanced tomography47,48,49. As shown in Fig. 2e, the shadow norm with ESIC depends on observable O and generally larger than that with Eocta, which indicates Eocta requires less shots M than ESIC to achieve the same accuracy of estimation \(\hat{O}\).

State reconstruction

The direct estimation from classical shadows \({\hat{\rho }}^{(m)}\), i.e., \(\hat{\rho }=1/M{\sum}_{m=1}^{M}{\hat{\rho }}^{(m)}\), is generally not a physical state with finite M measurements, which limits the application of shadow tomography in estimation of nonlinear functions37,50. Physical constraints need to be introduced to enforce the positivity of the reconstructed state τ, which can be addressed by solving the optimization problem

$$\begin{array}{rcl}&{{\mbox{minimize}}}\,&{\hat{N}}_{F}(\tau )=\frac{2}{M(M-1)}\mathop{\sum}_{m < n}{{{{{{{\rm{T}}}}}}}}r\left[{\hat{\rho }}^{(m)}{\hat{\rho }}^{(n)}\right]+{{{{{{{\rm{T}}}}}}}}r({\tau }^{2})-2{\sum}_{m}{{{{{{{\rm{T}}}}}}}}r({\hat{\rho }}^{(m)}\tau )\\ &\,{{\mbox{subject to}}}\,&\tau \ge 0,{{{{{{{\rm{T}}}}}}}}r(\tau )=1,\end{array}$$
(3)

where τ is the proposed state that is positive semidefinite (τ ≥ 0) with unit trace (Tr(τ) = 1), and the cost function \({\hat{N}}_{F}(\tau )\) is the unbiased estimator of squared Frobenius norm with \(\{{\hat{\rho }}^{(m)}\}\) (see Supplementary Note 2A for more details). Note that the squared state fidelity adopted in SGQT20,21,22 is not an unbiased estimator with \(\{{\hat{\rho }}^{(m)}\}\) for mixed state. We employ an iterative self-learning algorithm, i.e., SPSA algorithm, to solve the optimization problem in Eq. (3). SPSA is especially efficient in multi-parameter optimization problems in terms of providing a good solution for a relatively small number of measurements of the objective function51, which holds the similar spirit as shadow tomography. In traditional maximum likelihood estimation (MLE) reconstruction14, the computational expense required to estimate gradient direction is directly proportional to the number of unknown parameters (4N − 1 for an N − qubit state) as it approximates the gradient by varying one parameter at a time, which becomes an issue when the number of qubit is large. In SPSA, the minimization of cost function \({\hat{N}}_{F}(\tau )\) is achieved by perturbing all parameters simultaneously, and one gradient evaluation requires only two evaluations of the cost function. While SPSA costs more iterations to converge, it returns state with higher fidelity in limited number of iterations compared to MLE21. More importantly, SPSA formally accommodates noisy measurements of the objective function, which is an important practical concern in experiment.

Generally, an N-qubit state τ can be modeled with d2 parameters with d = 2N being the dimension of τ. Thus, the proposed state τ is determined by a d2-dimensional vector \({{{{{{{\boldsymbol{r}}}}}}}}=[{r}_{1},{r}_{2},\cdots \,,{r}_{{d}^{2}}]\). SPSA optimization estimates the gradient by simultaneously perturbing all parameters ri in a random direction, instead of individually addressing each ri. In k th iteration, the simultaneous perturbation approximation has all elements of rk perturbed together by a random perturbation vector \({{{{{{{{\boldsymbol{\Delta }}}}}}}}}_{k}=[{\Delta }_{k1},{\Delta }_{k2},\cdots \,,{\Delta }_{k{d}^{2}}]\) with Δki being generated from Bernoulli ± 1 distribution with equal probability. Then the gradient is calculated by

$${{{{{{{{\boldsymbol{g}}}}}}}}}_{k}=\frac{{\hat{N}}_{F}({{{{{{{{\boldsymbol{r}}}}}}}}}_{k}+{B}_{k}{{{{{{{{\boldsymbol{\Delta }}}}}}}}}_{k})-{\hat{N}}_{F}({{{{{{{{\boldsymbol{r}}}}}}}}}_{k}-{B}_{k}{{{{{{{{\boldsymbol{\Delta }}}}}}}}}_{k})}{2{B}_{k}}{{{{{{{{\boldsymbol{\Delta }}}}}}}}}_{k},$$
(4)

and rk is updated to rk+1 by rk+1 = rk + Akgk. Ak and Bk are functions in forms of \({A}_{k}={a}_{1}/{(k+{a}_{2})}^{{a}_{3}}\) and \({B}_{k}={b}_{1}/{k}^{{b}_{2}}\) with a1, a2, a3, b1 and b2 being hyperparameters that determine the convergence speed of algorithm, which can be generally obtained from numerical simulations (see Supplementary Note 2B for hyperparameter settings). SLST is terminated when there is little change of \({\hat{N}}_{F}({{{{{{{{\boldsymbol{r}}}}}}}}}_{k})\) in several successive iterations, and corresponding τk is the reconstructed state. We emphasize that SPSA inevitably introduces systematic errors of the reconstructed state, as well as other optimization algorithms such as MLE and least squares52. In fact, it is a tradeoff that the reconstruction of a physical state suffers from a bias.

As the prepared single-photon state is extremely closed to the ideal state \(\vert {\psi }_{\gamma,\phi }\rangle\), the accuracy of reconstruction is characterized by the state fidelity between returned state τk and ideal state \(\big \vert {\psi }_{\gamma,\phi }\big \rangle\), i.e., \(F=\sqrt{{{{{{{{\rm{T}}}}}}}}r({\tau }_{k}\vert {\psi }_{\gamma,\phi }\rangle \langle {\psi }_{\gamma,\phi }\vert )}\). The results of average fidelity of SLST over 20 prepared \(\vert {\psi }_{\gamma,\phi }\rangle \in {{{{{{{\bf{P}}}}}}}}\) after k = 30 iterations are shown with red dots in Fig. 3a, where the average fidelity increases as M increases and achieves 0.992 ± 0.001 with M = 315 measurements. The fabricated metasurface is also capable to collect data required for state reconstruction with other technologies, i.e., SGQT20,21,22 and MLE reconstruction (see Supplementary Note 3 for demonstration of SGQT). In SGQT, two projective measurements are performed with 7 experimental runs in each iteration, and SPSA is used to update the proposed state τSGQT. The results of \(F=\sqrt{{{{{{{{\rm{T}}}}}}}}r({\tau }^{{{{{{{{\rm{SGQT}}}}}}}}}\vert {\psi }_{\gamma,\phi }\rangle \langle {\psi }_{\gamma,\phi }\vert )}\) are shown with yellow triangles in Fig. 3a, in which we observe an average fidelity of 0.983 ± 0.003 after 45 iterations (total experimental runs of 315 as the same as that in SLST). The results of MLE reconstruction \(F=\sqrt{{{{{{{{\rm{T}}}}}}}}r({\tau }^{{{{{{{{\rm{MLE}}}}}}}}}\vert {\psi }_{\gamma,\phi }\rangle \langle {\psi }_{\gamma,\phi }\vert )}\) are shown with cyan squares in Fig. 3a. When M is small (M < 60), MLE reconstruction is more accurate than SGQT. However, SLST always exhibits higher accuracy compared to other techniques with the same number of experimental runs. It is worth noting that the average fidelity with MLE reconstruction converges to 0.93 ± 0.01 and the error of reconstruction is about 0.07, which is consistent with errors in estimation of observables in Fig. 2b. Although the error of metasurface reduces the accuracy of shadow tomography and MLE reconstruction, SLST and SGQT with SPSA optimization can dramatically suppress metasurface-induced error as SPSA can accommodate noisy measurements of the cost function. The accuracy of SLST does not keep increasing with the number of iterations as reflected in Fig. 3b, where the converged fidelity depends on the number of experimental runs M in classical shadow collection.

Fig. 3: Experimental results of self-learning shadow tomography (SLST) on one-photon and two-photon states.
figure 3

a The average fidelity between reconstructed single-photon states τ and target state \({\left\vert \psi \right\rangle }_{\gamma,\phi }\) using SLST, self-guided quantum tomography (SGQT), maximum likelihood estimation (MLE) reconstruction. b Average fidelity of SLST by increasing experimental runs M from 10 to 1000. c Fidelity between reconstructed two-photon states τ and target state ρη using SLST and MLE. d The fidelities of two-photon states reconstruction from SLST (dash lines) with M = 2000 measurements. The solid lines represent the fidelity from MLE tomography with M = 2000 measurements. The dots and bars in (a) and (c) are the mean value and the corresponding standard deviations obtained by repeating the experiment 5 times. The dashed lines and shadings in (d) and (e) are the mean value and standard deviation obtained by repeating the iteration 5 times.

We also demonstrate SLST on two-photon entangled states \({\left\vert \psi \right\rangle }_{\eta }=\sqrt{\eta }\left\vert HV\right\rangle+\sqrt{1-\eta }\left\vert VH\right\rangle\) with η = 0.06, 0.37 and 0.87. In two-photon SLST, one photon is detected by metasurface-enabled Eocta, and the other photon is detected by randomly choosing σx, σy and σz measurements realized by an E-HWP and an E-QWP. In contrast to the single-photon state, the generated two-photon state ρη is far from pure state as it is affected by more noises that are mainly attributed to high-order emission in SPDC and mode mismatch when overlapping two photons in Sagnac interferometer. Thus, the proposed state τk should be a mixed state in general form of τk = TT with T being a complex lower triangular matrix (see Supplementary Note 2C for details). Accordingly, the accuracy of reconstruction is characterized by the fidelity between returned state τk and ρη, where ρη is MLE reconstruction with large amount of data (M ≈ 8 × 105) collected from bulk optical setting (waveplates and PBS). The results of \(F={{{{{{{\rm{T}}}}}}}}r\Big(\sqrt{\sqrt{{\tau }_{200}}{\rho }_{\eta }\sqrt{{\tau }_{200}}}\Big)\) are shown with dots in Fig. 3c, the fidelities of three states reach 0.986 ± 0.002, 0.990 ± 0.001 and 0.981 ± 0.002 with M = 2000 experimental runs and k = 200 iterations. We also perform MLE reconstruction τMLE of two-photon states, where one photon is detected by metasurface and the other is detected by bulk optical setting. The results of \(F={{{{{{{\rm{T}}}}}}}}r\Big(\sqrt{\sqrt{{\tau }^{{{{{{{{\rm{MLE}}}}}}}}}}{\rho }_{\eta }\sqrt{{\tau }^{{{{{{{{\rm{MLE}}}}}}}}}}}\Big)\) with M experimental runs are shown with squares in Fig. 3c. The error in two-photon MLE reconstruction is about 0.047 ± 0.005, which is smaller than that in single-photon MLE reconstruction as only one photon is detected by noisy device (metasurface). In Fig. 3d, we show that the fidelity of SLST with M = 2000 is converging after k = 200 iterations.

Robust shadow tomography

Finally, we demonstrate robustness of SLST can be further improved by robust shadow tomography42,53. Considering that the measurement apparatus are noisy, the measurement apparatus can be calibrated prior to performing SLST. To this end, shadow tomography is firstly performed on high-fidelity state \(\left\vert HH\right\rangle\) with \({M}^{{\prime} }\) experimental runs to calculate the noisy quantum channel \(\widetilde{{{{{{{{\mathcal{M}}}}}}}}}\). Consequently, the classical shadow is constructed by the noisy channel, i.e., \({\hat{\rho }}^{(m)}={\widetilde{{{{{{{{\mathcal{M}}}}}}}}}}^{-1}(\big \vert {\psi }_{{l}_{1}}\big \rangle \big \langle {\psi }_{{l}_{1}}\big \vert \otimes \big \vert {\psi }_{{l}_{2}}\big \rangle \big \langle {\psi }_{{l}_{2}}\big \vert )\) (See Supplementary Note 1B for details of robust shadow tomography). The framework of robust shadow tomography is valid in our experimental setting. Firstly, although two photons are detected with different measurement devices, i.e., one is the metasurface-enabled POVM while the other is randomly detected on three Pauli bases, the mathematical models of these two measurement devices are identical. Secondly, although the metasurface-induced measurement errors are different between six projections, it has been shown that gate-dependent noise can be suppressed by robust shadow tomography42. Finally, the experimental device is able to generate \(\left\vert HH\right\rangle\) with sufficiently high fidelity. Otherwise, the noise in state preparation might be added in \(\widetilde{{{{{{{{\mathcal{M}}}}}}}}}\), which introduces biased estimation of returned state. In our experiment, the fidelity of prepared \(\left\vert HH\right\rangle\) is 0.9956 ± 0.0005 with respect to the ideal form. To demonstrate robust SLST, we insert a tunable attenuator before metasurface to introduce optical loss from 1.5 dB to 8.6 dB, which accordingly reduces the fidelity of prepared state as reflected by the MLE reconstruction shown in Fig. 4. Compared to SLST, robust SLST is able to enhance the accuracy of reconstruction in the presence of optical loss, especially at the high-level optical loss. It is worth mentioning that SLST itself can accommodate metasurface-induced measurement errors so that the enhancement of robust SLST is not significant when optical loss is zero. Increasing the optical loss is equivalent to stronger measurement noise. We observe the significant enhancement of robust SLST at high-level optical loss, which indicates robust SLST can further improve the robustness of SLST against noise (See Supplementary Note 1C for numerical simulations of robust SLST).

Fig. 4: Results of fidelities from robust self-learning shadow tomography (SLST), SLST and maximum likelihood estimation (MLE) reconstruction on two-photon states.
figure 4

a ρη=0.06, b ρη=0.37 and c ρη=0.87. In each reconstructions, the experiment is carried out with M = 1000 runs. In robust SLST, additional \({M}^{{\prime} }=2000\) experimental runs are used for calibration. We set k = 200 in robust SLST and SLST. The error bars are the standard deviations in SLST (robust SLST), obtained from Monte Carlo simulation with assumption that the collected photons in M (\({M}^{{\prime} }\) and M) experimental runs have Poisson distribution.

Discussion

We propose and demonstrate POVM with a single metasurface that enables implementation of real-time shadow tomography and observation of sample complexity. Together with the developed SLST, the underlying quantum states can be reconstructed efficiently, accurately and robustly. The advantages are evident even in single- and two-photon polarization-encoded states. The concept of octahedron POVM can be readily realized with integrated optics, where the directional couplers and phase shifters are able to construct octahedron POVM encoded in path degree of freedom. Metasurface-enabled POVM is particularly promising for efficient detection of scalable polarization-encoded multiphoton entanglement, in which two measurement devices are sufficient for full characterization54. Our investigation is compatible with metasurface-enabled generation7,8 and manipulation9,10,11 of photonic states, thereby opening the door to quantum information processing with a single ultra-thin optical device.

Methods

Fabrication of metasurface

A 700 nm-thick layer of a-Si is deposited on top of 750 μm-thick fused quartz wafers using the low-pressure chemical vapor deposition (LPCVD) technique. Then a layer of AR-P6200.09 resists (Allresist GmbH) with a thickness of 200 nm is spun and coated on the substrate. The metasurface pattern is generated with electron-beam lithography (EBL) process which is set with 120 kV, 1 nA current and 300 μc cm−2 dose. Subsequently, the resist is developed with AR300-546 (Allresist GmbH) for 1 min. Reaction ion etching (RIE) is performed to transfer the nanostructures to a-Si film. The residue resist is removed by immersing the chip first in acetone for 5 min, then in isopropanol for 5 min and finally in deionized water.

Experimental setup to implement SLST with metasurface

Metasurface is fixed on a piece of hollow plastic, which can be adjusted in six degrees of freedom through a six-dimensional rotation stage. Objective lens with 20 × magnifying factor and tube lens with the focal length of 200 mm is used as a microscope, enlarging the distance of six spots focused by metasurface from 70 μm to 1.9 mm. Then, three prisms at different heights are applied to separate six light beams. Four mini lenses with f = 15 mm and two mini lenses with f = 30 mm are used to couple the six beams into six multi-mode fibers with the core diameter of 62.5 μm.