Overview and Installation
=========================
DUBFI is a tool for solving a generalized Bayesian flux inversion problem.
It can be used as a library or as a command line tool.
Functionality
-------------
DUBFI is designed to address a specific flux inversion problem described in the
:doc:`scientific documentation `.
But it is not a full inversion system for flux estimation.
DUBFI takes the model-observation mismatch and categorized model-equivalents as input.
Here, model-equivalents are the model prediction for observations.
We assume that these can be decomposed into multiple categories which add up linearly.
A category of model equivalents can be, e.g., the contribution of a trace gas emitted
within a specific country to the concentration of this gas at observation sites.
DUBFI computes one scalar scaling factor for each category of model equivalents.
These scaling factors are chosen such that the sum of the scaled model equivalents is
close to the observations, considering different types of uncertainties.
The output of DUBFI consists of scaling factors and their uncertainties,
it does not compute fluxes.
.. important::
DUBFI will only compute scaling factors (prefactors) for fluxes based on
observations and categorized model equivalents. It does not compute fluxes.
A full inversion framework using DUBFI is yet to be published.
DUBFI includes a linear algebra framework, which consists of an abstract interface
and multiple implementations. The distributed (:term:`MPI`) implementation of that
framework makes use of an approximation that is valid for the problem of flux
estimation when configured appropriately. This framework might be useful for other
tasks that involve matrices with a physically motivated localization. But when
applying this to other problems, please make sure you understand the implications
(see :doc:`linalg`).
Installation
------------
DUBFI is a pure python package. However, it depends on the python packages
`mpi4py `_ and
`numba `_ with non-trivial
installation on some devices.
Please consult the documentation of these packages if you encounter installation issues.
To install with pip, first download the |downloadlink|.
In the directory containing the downloaded file, run the command:
.. parsed-literal::
python3 -m pip install dubfi-|release|-py3-none-any.whl
Test your installation
----------------------
Once you have DUBFI and all dependencies plus :code:`pytest` installed,
you can check your installation using a basic integration test::
MPI_WORKERS=2 OMP_NUM_THREADS=2 NUMBA_NUM_THREADS=1 python3 \
-m dubfi.tests.test_integration
This generates random input data, runs the same inversion with all three
linear algebra implementations, and compares the results.
User interface
--------------
When using DUBFI from the command line for small test cases, you can usually run::
python3 -m dubfi -c /path/to/config.yml -t /path/to/target/directory
To see more options, use the command line argument :code:`--help`.
The inversion requires :doc:`input data ` and a :doc:`configuration `.
Running the inversion furthermore requires that you define the number of :term:`MPI` worker processes or run an :term:`MPI` job on an :term:`HPC` system as described in :doc:`run`.
Module Overview
---------------
This tool mainly consists of three components:
1. Linear algebra types and functions required for the inversion;
2. The Bayesian flux inversion problem; and
3. Interfaces to apply the inversion to a flux estimation problem.
1. Linear algebra
^^^^^^^^^^^^^^^^^
This package provides abstract vector and linear operator types with three different implementations:
1. All operators are represented as dense matrices (numpy arrays)
2. Use sparse (:term:`CSC`) matrices to avoid large dense matrices, reducing the memory requirements.
3. Approximate large, approximately diagonal matrices using :term:`MPI`-distributed dense matrices.
The implementations of 1 and 2 are mainly for testing and for small problems.
The :term:`MPI`-based implementation is optimized for the application to the flux inversion problem.
It is scalable to large observation sets, provided that all correlations are sufficiently local.
2. Bayesian inversion
^^^^^^^^^^^^^^^^^^^^^
The Bayesian inversion is formulated for the following problem:
We have observations and a model for describing those observations.
The model depends linearly on parameters which we aim to optimize in the inversion.
The model has significant uncertainties and these uncertainties depend on the parameters.
The uncertainties are approximated by an ensemble (or multiple ensembles), assuming that their distribution is Gaussian.
We consider the case where the number of observations may be large, :math:`O(10^4)`, but the number of parameters and the number of ensemble members are small, usually :math:`O(10^2)` or less.
See :doc:`/scidoc/inversion` for details.
3. Application to flux inversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the flux inversion problem, we aim to adjust (surface) fluxes into the atmosphere based on observations of the atmospheric composition.
The application in :mod:`dubfi.fluxes` assumes that time series of joint model and observation data at observation sites are provided in a specific format.
The configuration of the inversion is provided as a :term:`YAML` file (see examples/config.yml).