Bayesian Inversion Problem

The inversion is implemented in dubfi.inversion.inversion.

Basic assumptions

The inversion problem aims at finding a state \(s\in\mathbb{R}^{N_s}\) that maximizes the probability \(P(s|y)\) of \(s\) given observations \(y\in\mathbb{R}^{N_y}\). We assume that the state and the observations are represented by finite-dimensional vectors of real numbers. The model is an affine linear map \(H\) mapping a state \(s\) to model-equivalents \(y^\text{model}\in\mathbb{R}^{N_y}\) that can be compared to the observations. We denote the vector space of such maps by \(\mathcal{A}\).

Note

An affine linear map \(H:\mathbb{R}^{N_s}\to\mathbb{R}^{N_y}\) is of the form \(H(s)=As+b\) where \(A\in\mathbb{R}^{N_y\times N_s}\) is a linear map (matrix) and \(b\in\mathbb{R}^{N_y}\) is a vector in observation space. Such affine linears map can be represented by \((N_s+1)N_y\) real numbers, which we can arrange in a column vector denoted by \(\underline{H}\).

The a priori probability distributions of state \(s\), observation vector \(y\), and model \(H\) are assumed to be Gaussian:

\[\begin{split}\begin{aligned} P(s) &\propto \exp\left[ -\tfrac12 (s - s_0)^\top B^{-1} (s - s_0) \right] \\ P(H) &\propto \exp\left[ -\tfrac12 (\underline{H} - \underline{H}_0)^\top W^{-1} (\underline{H} - \underline{H}_0) \right] \\ P(y|y_0) &\propto \exp\left[ -\tfrac12 (y - y_0)^\top R^{-1} (y - y_0) \right] \end{aligned}\end{split}\]

Here we use the notation \(\underline{H}\) when treating the affine linear map \(H\) as a column vector.

Table 1 Notation used in this section.

\(P(a|b)\)

probability (density) of \(a\) conditioned on \(b\)

\(y\in\mathbb{R}^{N_y}\)

observation vector

\(y_0\in\mathbb{R}^{N_y}\)

perfect observation (reality)

\(s\in\mathbb{R}^{N_s}\)

state that shall be optimized (e.g. scaling factors)

\(s_0\in\mathbb{R}^{N_s}\)

prior scaling factors

\(\mathcal{A}\sim\mathbb{R}^{(N_s+1)N_y}\)

space of affine linear maps from states to observations

\(H_0\in\mathcal{A}\)

model mapping state \(s\) to observation prediction

\(H\in\mathcal{A}\)

random variable describing uncertainty of \(H_0\)

\(\underline{H}\in\mathbb{R}^{(N_s+1)N_y}\)

column vector representation of \(H\)

Computing \(P(y|s)\)

Since we do not know the reality \(y_0\), we replace it by the model prediction \(H(s)\) for a given realization of the model \(H\) and state \(s\):

\[\begin{aligned} P(y|H,s) &\propto \exp\left( -\tfrac12 [y - H(s)]^\top R^{-1} [y - H(s)] \right) . \end{aligned}\]

For a given transport realization \(H\) and flux parametrization \(s\), \(P(y|H,s)\) describes the measurement uncertainty. Since the transport \(H\) is not known exactly, we need to eliminate this random variable by integrating it out:

\[\begin{aligned} P(y|s) &\propto \int dH\, P(H) P(y|H,s) . \end{aligned}\]

This is a high-dimensional Gaussian integral, which we solve using the notation

\[\begin{split}\begin{aligned} H^* &\equiv H - H_0 , \\ Y(s') &\equiv y - H_0(s') \\ \end{aligned}\end{split}\]

for two affine linear maps \(H^*,Y\in\mathcal{A}\).

(1)\[\begin{split}\begin{aligned} P(y|s) &\propto \int dH\, \exp\left\{ -\tfrac12 [y - H(s)]^\top R^{-1} [y - H(s)] - \tfrac12 \underline{H}^{*\top} W^{-1} \underline{H}^*) \right\} \\ &= \int dH^*\, \exp\left\{ -\tfrac12 [H^*(s) - Y(s)]^\top R^{-1} [H^*(s) - Y(s)] - \tfrac12 \underline{H}^{*\top} W^{-1} \underline{H}^* \right\} \end{aligned}\end{split}\]

In the following, we will treat \(s\) as a constant parameter since we compute \(P(y|s)\). This allows us to define a projector \(S:\mathbb{R}^{(N_s+1)N_y}\to\mathbb{R}^{N_y}\) from the space of affine linear maps to vectors in observation space by \(S\underline{a}=a(s)\) for all \(a\in\mathcal{A}\). With this notation we can define a matrix \(V\equiv S^\top R^{-1}S\) such that \(\underline{a}^\top V \underline{a'} = a(s)^\top R^{-1} a'(s)\) for all \(a,a'\in\mathcal{A}\). To solve the integral in (1), we need a matrix \(Q\equiv W^{-1} + V\):

(2)\[\begin{split}\begin{aligned} &P(y|s) \\ &= \int\! dH^*\, \exp\left\{ -\tfrac12 (\underline{H}^* - \underline{Y})^\top V (\underline{H}^* - \underline{Y}) - \tfrac12 \underline{H}^{*\top} W^{-1} \underline{H}^* \right\} \\ &= \int\! dH^*\, \exp\left\{ -\tfrac12 \underline{H}^{*\top} Q \underline{H}^* + \tfrac12 \left[ \underline{H}^{*\top} V\underline{Y} + \underline{Y}^\top V \underline{H}^* \right] - \tfrac12 \underline{Y}^\top V \underline{Y} \right\} \\ &= \int\! dH^*\, \exp\left\{ -\tfrac12 (\underline{H}^* - Q^{-1}V\underline{Y})^\top Q (\underline{H}^* - Q^{-1}V\underline{Y}) + \tfrac12 \underline{Y}^\top VQ^{-1}V \underline{Y} - \tfrac12 \underline{Y}^\top V \underline{Y} \right\} \\ &\propto (\det Q)^{-\tfrac12} \exp\left\{ \tfrac12 \underline{Y}^\top VQ^{-1}V \underline{Y} - \tfrac12 \underline{Y}^\top V \underline{Y} \right\} \end{aligned}\end{split}\]

The exponent can be simplified using

\[\begin{split}\begin{aligned} \underline{Y}^\top VQ^{-1}V \underline{Y} &= \underline{Y}^\top VW (1 + V W)^{-1} V\underline{Y} \\ &= \underline{Y}^\top V \underline{Y} - \underline{Y}^\top (1 + V W)^{-1} V \underline{Y} . \end{aligned}\end{split}\]

From the definition of \(V\) we get

\[(1 + V W)^{-1} V = (1 + S^\top R^{-1} S W)^{-1} S^\top R^{-1} S = S^\top (R + S W S^\top)^{-1} S\]

(Prove the second equality by multiplying from the left with \(1+S^\top R^{-1} SW\).) For the exponent in (2), this implies

\[\begin{split}\begin{aligned} \underline{Y}^\top VQ^{-1}V \underline{Y} - \underline{Y}^\top V \underline{Y} &= -\underline{Y}^\top S^\top (R + S W S^\top)^{-1} S \underline{Y} \\ &= -Y(s)^\top (R + SWS^\top)^{-1} Y(s) . \end{aligned}\end{split}\]

For \(\det(Q)\) in (2) we can use Sylvester’s determinant theorem:

\[\begin{split}\begin{aligned} \det(Q) &= \det(W^{-1} + S^\top R^{-1} S) \\ &= \det(W)^{-1} \det(I + S^\top R^{-1} S W) \\ &= \det(W)^{-1} \det(I + R^{-1} S W S^\top) \\ &= \frac{\det(R + S W S^\top)}{\det(W) \det(R)} \end{aligned}\end{split}\]

This leads to the final form

\[\begin{split}\begin{aligned} P(y|s) &= (\det \tilde{R}_s)^{-1/2}\, \exp\left\{ -\tfrac12 [y - H_0(s)]^\top \tilde{R}_s^{-1} [y - H_0(s)] \right\} \times\text{const}, \\ \tilde{R}_s &= R + SWS^\top . \end{aligned}\end{split}\]

Result for \(P(s|y)\)

By simple Bayesian probability theory, we obtain

(3)\[\begin{split}\begin{aligned} &P(s|y) \\ &\propto P(y|s) \, P(s) \\ &\propto (\det \tilde{R}_s)^{-1/2}\, \exp\left\{ -\tfrac12 [y - H_0(s)]^\top \tilde{R}_s^{-1} [y - H_0(s)] - \tfrac12 (s-s_0)^\top B^{-1} (s-s_0) \right\} \end{aligned}\end{split}\]

Thus, including the model uncertainty via matrix \(W\) leads to a modified uncertainty matrix \(\tilde{R}_s\) which now depends on \(s\). This \(s\)-dependence also affects the normalization prefactor \((\det\tilde{R}_s)^{-1/2}\). The probability distribution for \(s\) is not Gaussian anymore.

To understand this result, we recall that \(W\) describes the uncertainty of \(H\) via \(\underline{H}\sim\mathcal{N}(\underline{H}_0, W)\). We can thus express \(SWS^\top\) using cumulants of the random variable \(H\), denoted by \(\langle\langle \underline{H}\,\underline{H}^\top \rangle\rangle_H = W\):

\[SWS^\top = \langle\langle S \underline{H}\,\underline{H}^\top S^\top \rangle\rangle_H = \langle\langle H(s) H(s)^\top \rangle\rangle_H\]

Hint

Here, the cumulant is just a notation for the covariances. We can use \(\langle\langle a b \rangle\rangle_x = \langle a b \rangle_x - \langle a \rangle_x \langle b \rangle_x\) where \(\langle a \rangle_x\) denotes the expectation value of \(a\) with respect to the random variable \(x\).

This relation lets us estimate \(\tilde{R}(s)\) using an ensemble for \(H\):

\[\tilde{R}(s)_{ij} = R_{ij} + \langle\langle H(s)_i\,H(s)_j \rangle\rangle_H\]

Modeling uncertainties

The task remains to estimate the modeling uncertainties. We assume that these mainly consist of three independent parts: transport uncertainty, emission uncertainty, and representativity error. Here, the emission uncertainty contains the uncertainty in flux patterns and temporal profiles, but not the variability in scaling factors. In leading order in the variations we can write:

\[\tilde{R}_s = R + R^\text{meteo}_s + R^\text{flux}_s + R^\text{repr}_s\]

\(R^\text{meteo}_s\) is estimated using a meteorological ensemble. \(R^\text{flux}_s\) can be estimated using an emission ensemble at fixed meteorology or some heuristic approximation considering the variability of local emissions. A good parametrization of the fluxes should leave only small uncertainties for \(R^\text{flux}_s\). \(R^\text{repr}_s\) depends on the observation characteristics.

Flux uncertainties

For a more realistic description, we should treat the emissions as a random variable. The calculation is equivalent to the one shown above if we replace the scaling factor \(s\) by an emission field \(f\). We now focus on a subspace of the emissions parametrized by \(f^p=\sum_i s_i f^i\) and optimize \(P(s|y)\) to find \(f^p\).

Formally, we can find a representation of the fluxes \(f=v\oplus s\) that splits the fluxes into a parametrized (\(s\in\mathbb{R}^{N_s}\)) and an unparametrized part (\(v\in\mathbb{R}^{N_v}\)). Assuming that \(f\sim\mathcal{N}(v_0 \oplus s_0, F)\) where \(F^{-1}=\begin{pmatrix} \Phi_v & \Phi_x \\ \Phi_x^\top & \Phi_s\end{pmatrix}\), we can derive

\[\begin{split}\begin{aligned} P(s) &= \int P(v\oplus s) \,d^{N_v}v \\ &\propto \int \exp\left[-\tfrac12 (v+\Phi_v^{-1}\Phi_x s)^\top \Phi_v (v+\Phi_v^{-1}\Phi_x s) - \tfrac12 s^\top (\Phi_s - \Phi_x^\top \Phi_v^{-1} \Phi_x) s \right] d^{N_v}v \\ &\propto \exp\left[-\tfrac12 s^\top (\Phi_s - \Phi_x^\top \Phi_v^{-1} \Phi_x) s \right] \\ \end{aligned}\end{split}\]

Thus, \(s\sim\mathcal{N}(s_0, B)\) where \(B^{-1}=\Phi_s - \Phi_x^\top \Phi_v^{-1} \Phi_x\). It is formally possible to integrate out the extra degrees of freedom introduced by \(v\) in the transport model with a similar integral. We do not compute the integral explicitly but note that the uncertainty contained in \(v\) will be captured by estimating \(\tilde{R}_s\) using an ensemble of \(H\) and \(v\):

\[\tilde{R}(s)_{ij} = R_{ij} + \langle\langle H(s)_i\,H(s)_j \rangle\rangle_{H,v}\]

Bias problem

The normalization term \(\det(\tilde{R}_s)^{-1/2}\) in (3) leads to a bias in the inversion results. The combined model and observation uncertainty \(\tilde{R}_s\) depends on the fluxes parametrized by \(s\). Smaller uncertainties imply a sharper distribution of \(P(y|s)\) with higher values near the optimal \(y\). Since the inversion tries to find the most likely observations, it will avoid states \(s\) that lead to large uncertainties – large \(\det(\tilde{R}_s)\) – that suppress \(P(y|s)\). This effect is illustrated for one-dimensional \(s\) and \(y\) in Fig. 1.

Bias of argmax(P(s|y)) due to normalization of P(y|s) that depends on s.

Fig. 1 Idealized inversion illustrating the bias in \(\mathop{\mathrm{arg\,max}}_s P(s|y)\) because the normalization of \(P(y|s)\) (black) depends on \(s\). In this case, \(s\) and \(y\) are one-dimensional. Probability densities (solid lines) are indicated qualitatively and do not match the axes of the plot. For properly normalized probabilities (left hand side) the maxima of \(P(s|y)/P(s)\) (red dashed line) do not coincide with the maxima of \(P(y|s)\) (blue). The flatter curves \(P(y|s)\) at large \(s\) lead to a shift of the maximum of \(P(s|y)/P(s)\) towards smaller \(s\), resulting in a negative bias for \(s\). The right hand size shows the functions \(P(y|s)\) renormalized to the maximum probability. In this case, the maxima of \(P(s|y)/P(s)\) (as function of \(s\)) and \(P(y|s)\) (as function of \(y\)) coincide.

In the case of an inversion of trace gas emissions without notable sinks, lower emissions imply smaller concentration uncertainties due to the transport. The emissions will therefore be underestimated.

To avoid the bias, one can replace the normalization term \(\det(\tilde{R}_s)^{-1/2}\) by a constant. In DUBFI, the weight of this normalization term is regulated by a tuning parameter – norm_prefactor in the code and configuration – which we denote by \(\alpha\) here. We minimize a cost function \(-\log(P(s|y))\) with the tuning parameter \(0\leq\alpha\leq 1\):

\[C(s) = \tfrac{\alpha}{2} \log(\det \tilde{R}_s) + \tfrac12 [y - H_0(s)]^\top \tilde{R}_s^{-1} [y - H_0(s)] + \tfrac12 (s-s_0)^\top B^{-1} (s-s_0)\]

The special case \(\alpha=0\) of this cost function was used by Bruch et al.[1], but may also introduce a bias. In this case, the inversion system prefers stronger fluxes because these lead to larger uncertainties in the concentrations.

This cost function and its derivatives are implemented in dubfi.inversion.inversion. The analytically computed derivatives are documented in dubfi.inversion.inversion.InvertorOptimizer.