Bayesian Inversion Problem¶
The inversion is implemented in dubfi.inversion.inversion.
Basic assumptions¶
The inversion problem aims at finding a state \(s\in\mathbb{R}^{N_s}\) that maximizes the probability \(P(s|y)\) of \(s\) given observations \(y\in\mathbb{R}^{N_y}\). We assume that the state and the observations are represented by finite-dimensional vectors of real numbers. The model is an affine linear map \(H\) mapping a state \(s\) to model-equivalents \(y^\text{model}\in\mathbb{R}^{N_y}\) that can be compared to the observations. We denote the vector space of such maps by \(\mathcal{A}\).
Note
An affine linear map \(H:\mathbb{R}^{N_s}\to\mathbb{R}^{N_y}\) is of the form \(H(s)=As+b\) where \(A\in\mathbb{R}^{N_y\times N_s}\) is a linear map (matrix) and \(b\in\mathbb{R}^{N_y}\) is a vector in observation space. Such affine linears map can be represented by \((N_s+1)N_y\) real numbers, which we can arrange in a column vector denoted by \(\underline{H}\).
The observation vector \(y\) and model \(H\) are assumed to be Gaussian:
Here we use the notation \(\underline{H}\) when treating the affine linear map \(H\) as a column vector.
\(P(a|b)\) |
probability (density) of \(a\) conditioned on \(b\) |
\(y\in\mathbb{R}^{N_y}\) |
observation vector |
\(y_0\in\mathbb{R}^{N_y}\) |
perfect observation (reality) |
\(s\in\mathbb{R}^{N_s}\) |
state that shall be optimized (e.g. scaling factors) |
\(s_0\in\mathbb{R}^{N_s}\) |
prior scaling factors |
\(\mathcal{A}\sim\mathbb{R}^{(N_s+1)N_y}\) |
space of affine linear maps from states to observations |
\(H_0\in\mathcal{A}\) |
model mapping state \(s\) to observation prediction |
\(H\in\mathcal{A}\) |
random variable describing uncertainty of \(H_0\) |
\(\underline{H}\in\mathbb{R}^{(N_s+1)N_y}\) |
column vector representation of \(H\) |
Computing \(P(y|s)\)¶
Since we do not know the reality \(y_0\), we replace it by the model prediction \(H(s)\) for a given realization of the model \(H\) and state \(s\):
For a given transport realization \(H\) and flux parametrization \(s\), \(P(y|H,s)\) describes the measurement uncertainty. Since the transport \(H\) is not known exactly, we need to eliminate this random variable by integrating it out:
This is a high-dimensional Gaussian integral, which we solve using the notation
for two affine linear maps \(H^*,Y\in\mathcal{A}\).
In the following, we will treat \(s\) as a constant parameter since we compute \(P(y|s)\). This allows us to define a projector \(S:\mathbb{R}^{(N_s+1)N_y}\to\mathbb{R}^{N_y}\) from the space of affine linear maps to vectors in observation space by \(S\underline{a}=a(s)\) for all \(a\in\mathcal{A}\). With this notation we can define a matrix \(V\equiv S^\top R^{-1}S\) such that \(\underline{a}^\top V \underline{a'} = a(s)^\top R^{-1} a'(s)\) for all \(a,a'\in\mathcal{A}\). To solve the integral in (1), we need a matrix \(Q\equiv W^{-1} + V\):
The exponent can be simplified using
From the definition of \(V\) we get
(Prove the second equality by multiplying from the left with \(1+S^\top R^{-1} SW\).) For the exponent in (2), this implies
For \(\det(Q)\) in (2) we can use Sylvester’s determinant theorem:
This leads to the final form
Result for \(P(s|y)\)¶
By simple Bayesian probability theory, we obtain
where \(P(s)\) is the a priori probability density function of the state vector. Thus, including the model uncertainty via matrix \(W\) leads to a modified uncertainty matrix \(\tilde{R}_s\) which now depends on \(s\). This \(s\)-dependence also affects the normalization prefactor \((\det\tilde{R}_s)^{-1/2}\). The probability distribution for \(s\) is not Gaussian anymore.
To understand this result, we recall that \(W\) describes the uncertainty of \(H\) via \(\underline{H}\sim\mathcal{N}(\underline{H}_0, W)\). We can thus express \(SWS^\top\) using cumulants of the random variable \(H\), denoted by \(\langle\langle \underline{H}\,\underline{H}^\top \rangle\rangle_H = W\):
Hint
Here, the cumulant is just a notation for the covariances. We can use \(\langle\langle a b \rangle\rangle_x = \langle a b \rangle_x - \langle a \rangle_x \langle b \rangle_x\) where \(\langle a \rangle_x\) denotes the expectation value of \(a\) with respect to the random variable \(x\).
This relation lets us estimate \(\tilde{R}(s)\) using an ensemble for \(H\):
This result is intuitive: The probability \(P(y|s)\) is just a Gaussian centered at the model prediction \(H_0(s)\). Its error covariance matrix consists of the observation error plus the model error, which can be obtained from a model ensemble.
Modeling uncertainties¶
The task remains to estimate the modeling uncertainties. We assume that these mainly consist of three independent parts: transport uncertainty, emission uncertainty, and representativity error. Here, the emission uncertainty contains the uncertainty in flux patterns and temporal profiles, but not the variability in scaling factors. In leading order in the variations we can write:
\(R^\text{meteo}_s\) is estimated using a meteorological ensemble. \(R^\text{flux}_s\) can be estimated using an emission ensemble at fixed meteorology or some heuristic approximation considering the variability of local emissions. A good parametrization of the fluxes should leave only small uncertainties for \(R^\text{flux}_s\). \(R^\text{repr}_s\) depends on the observation characteristics.
Flux uncertainties¶
For a more realistic description, we should treat the emissions as a random variable. The calculation is equivalent to the one shown above if we replace the scaling factor \(s\) by an emission field \(f\). We now focus on a subspace of the emissions parametrized by \(f^p=\sum_i s_i f^i\) and optimize \(P(s|y)\) to find \(f^p\).
Formally, we can find a representation of the fluxes \(f=v\oplus s\) that splits the fluxes into a parametrized (\(s\in\mathbb{R}^{N_s}\)) and an unparametrized part (\(v\in\mathbb{R}^{N_v}\)). Assuming that \(f\sim\mathcal{N}(v_0 \oplus s_0, F)\) where \(F^{-1}=\begin{pmatrix} \Phi_v & \Phi_x \\ \Phi_x^\top & \Phi_s\end{pmatrix}\), we can derive
Thus, \(s\sim\mathcal{N}(s_0, B)\) where \(B^{-1}=\Phi_s - \Phi_x^\top \Phi_v^{-1} \Phi_x\). It is formally possible to integrate out the extra degrees of freedom introduced by \(v\) in the transport model with a similar integral. We do not compute the integral explicitly but note that the uncertainty contained in \(v\) will be captured by estimating \(\tilde{R}_s\) using an ensemble of \(H\) and \(v\):
Wrong uncertainties cause a bias¶
The normalization term \(\det(\tilde{R}_s)^{-1/2}\) in (3) regulates a bias in the inversion results.
Ideally, it removes a bias that arises when the model prefers higher uncertainties and therefore larger fluxes.
But this compensation relies on correct uncertainty estimates.
When uncertainties are overestimated, the normalization term gets more weight relative to the other terms in the cost function.
This leads to underestimated fluxes.
The relative weight of the normalization term can be adjusted with the norm_prefactor tuning parameter, denoted \(\alpha\) in the cost function:
For correctly estimated uncertainties, one should use \(\alpha=1\). When the uncertainties are overestimated by a factor \(\gamma\) (\(\chi^2/N_{dof}=\gamma^{-2}\)), one should use \(\alpha=\gamma^{-2}\).
The special case \(\log(P(s))= -\tfrac12 (s-s_0)^\top B^{-1} (s-s_0)\) (Gaussian prior) and \(\alpha=0\) of this cost function was used by Bruch et al.[1]. In this case, the inversion system prefers stronger fluxes because these lead to larger uncertainties in the concentrations.
This cost function and its derivatives are implemented in dubfi.inversion.inversion.
The analytically computed derivatives are documented in dubfi.inversion.inversion.InvertorOptimizer.
A priori PDF¶
DUBFI provides an implementation of the cost function \(J(s)\) in the solver
dubfi.inversion.inversion.InvertorOptimizer with Gaussian a priori PDF \(P(s)\).
But this a priori PDF can easily be replaced by inheriting classes of the solver.
Such inheriting classes must implement the prior loss function \(-\log(P(s))\)
and its first two derivatives (gradient and Hessian).
Through this interface, two more a priori PDFs are provided:
A log-normal a priori PDF (dubfi.inversion.inversion_lognormal.InvertorOptimizerLogNormal)
and a regularized (non-negative) Gaussian (dubfi.inversion.inversion.InvertorOptimizerPositive).
In the configuration, the user provides the type of PDF, its most likely value (\(\mathop{\mathrm{arg\,max}}_s P(s)\)),
and the error covariance matrix of a Gaussian that locally approximates teh PDF near its maximum.
The regularized Gaussian suppresses the probability for scaling factors below a small, positive threshold reg_cutoff.