Quick start¶
0. Install and test DUBFI¶
Download the dubfi package, then run the command:
python3 -m pip install dubfi-0.1.3.dev1+g6835ce2c1-py3-none-any.whl
To test your installation, install also pytest and run:
MPI_WORKERS=2 OMP_NUM_THREADS=2 NUMBA_NUM_THREADS=1 python3 \
-m dubfi.tests.test_integration
1. Get the input data¶
You need an ensemble of sector-resolved model predictions and an additional deterministic run (i.e. best guess) of the sector-resolved model prediction. These data must be provided as one netCDF file per observation time series, in which model and observation data are provided on the same coordinates. The required input data format is described in Data Interfaces (including examples).
2. Configuration¶
Configure the inversion in a YAML file as described in Inversion configuration. You can start from the example configuration.
The following parameters must usually be adjusted to your input data:
mec_dir,orig_mec_dir(optional)input.*uncertainty.natural_signal_names(optional)coordinate_filter.startandcoordinate_filter.end
The coordinate and data filters should also be adjusted. To get started, you can omit the following parameters in data_filter that require additional input data:
min_wind, max_nordstream, nordstream_average, max_wildfires, wildfire_average.
Parameters in uncertainty and segment_buffer are closely connected and should be adjusted with some understanding of the method.
Hint
Adjust coordinate_filter.start and coordinate_filter.end to a short time interval (few days) for technical tests.
This can drastically reduce the required computational resources.
3. Job script¶
Write a job script for running DUBFI following the example in Running inversions. This must be adjusted to the size of the problem, the hardware, the job scheduler, installed libraries, available memory, and possibly other details of the computer.
To adjust the parameters, you should consider the following questions:
Given one observation, how many other data points are in a time range that permits correlations? This determines the minimum required matrix size. The cutoff scale for correlations is configured in
segment_buffer.t_cutoff.How many observations are there in one inversion time window? How many segments of correlated observations do you need? This determines the number of parallel MPI worker jobs. For a technical test with few observations, two worker processes should be sufficient.
How many CPU cores are available per MPI worker job? Adjust the lower level parallelization accordingly (OMP, Numba, your BLAS implementation).
4. Interpret the results¶
If the inversion runs successfully, you will obtain an output file inversion_result.nc.
The most interesting numbers there are the coefficients or scaling factors encoded in the variables:
s_prior: a priori scaling factors, \(s_0\) in Bayesian Inversion Problems_post_kalman: posterior when using the approximation \(\tilde{R}(s) = \tilde{R}(s_0)\) in Bayesian Inversion Problem, equivalent to a Kalman filters_post: posterior, solution of the optimization problem described in Bayesian Inversion Problem
The uncertainties are saved in the corresponding error covariance matrices b_prior, b_post_kalman, and b_post.
For more details, see Scaling factors.