Inversion configuration
=======================
This section introduces the parameters in the configuration required for a flux inversion.
An example file is provided in examples/config.yml.


Processing the configuration
----------------------------
Each inversion run has an input configuration file and a target directory (:code:`$TARGET` in the following).
The inversion reads the configuration file and the input data, for which the configuration file defines a source directory (entry :code:`mec_dir`).
The input configuration is extended to include information about the utilized data (entries :code:`ssh` and :code:`dimensions`).
The extended configuration is then written to :code:`$TARGET/config.yml`.
When using :term:`MPI`, the worker processes will read this extended configuration.

The inversion output will be written to :code:`$TARGET/inversion_result.nc`.
This file is created quickly after the initialization. It is then filled with data as the inversion run progresses.


General settings
----------------

- :code:`species` defines the type of gas considered. In the currently used data format, the species is part of some netCDF variable names (see :doc:`interface`).

- :code:`validation_sites` defines a selection of observation sites that shall be excluded from the inversion, allowing them to be used for validation.
  This allows reading data with a far-field correction constructed without these stations.

- :code:`input.bc_name` select the boundary conditions.

- :code:`input.use_bc_correction` selects whether a correction of the lateral boundary contribution is used.
  Note that this correction is not computed by DUBFI but must be provided in the input files.
  The implementation of the boundary correction used at :term:`DWD` is not yet published.

- :code:`input.vertical_coordinate` defines the attribute of input netCDF files that is used as vertical coordinate of the observation site.

- :code:`log` sets the default log level and can be used to define paths to log files.


Input data filtering
--------------------
The input is filtered by :code:`coodinate_filter` and :code:`data_filter`.
:code:`coodinate_filter` defines which stations, sampling heights, and times shall be used.
:code:`data_filter` defines rules for excluding data points based on wind speed or model data mismatch. This filtering step will lead to different results when using different meteorologies (ensemble members), boundary conditions, or input data in general. For consistent filtering, make sure that :code:`data_filter` is always applied to the same data.


Uncertainty matrix R
--------------------
Entries in :code:`uncertainty` define the construction of the error covariance matrix R of the model data mismatch.
This includes the correlation (cutoff) scales :code:`hscale_m`, :code:`vscale_m`, and :code:`tscale` (all in :code:`uncertainty`).

Based on these scales, the cutoff scales in :code:`segment_buffer` must be adjusted.
These cutoff scales are used to select buffer data when distributing the data points to :term:`MPI` worker processes.
**If the cutoff scales are too small, the inversion will likely converge to wrong results!**
To be save, one should choose :code:`segment_buffer.t_cutoff = 5 * uncertainty.tscale / segment_buffer.buffer_prefactor` and :code:`segment_bffer.h_cutoff_m = 5 * uncertainty.hscale_m`.
The cutoff scales in :code:`segment_buffer` are crucial for the computational effort of the inversion.

Another parameter affecting the matrix R is :code:`data_filter.outlier_threshold`, which defines an uncertainty inflation for outlier data points.


Inversion
---------
The inversion depends on the prior error covariance matrix defined in :code:`prior` and :code:`cycle`, and on parameters in :code:`inversion`.

:code:`inversion.norm_prefactor` plays a special role since its entries lead to repeated inversion runs.
See :ref:`bias_problem` for details.
The cost function in the Bayesian inversion contains a prefactor to a normalization term as a tuning parameter.
:code:`inversion.norm_prefactor` defines a list of these prefactors and the inversion will be repeated for list entry.
Note that an inversion for prefactor zero is significantly faster than for non-zero values.

The parameters :code:`inversion.solver_tol` and :code:`inversion.solver_options` define the targeted solver precision. By determining the number of required iterations of the solver, these parameters strongly influence the runtime.


Example
-------

.. literalinclude:: ../../examples/config.yml
   :language: yaml