Data Interfaces¶
This page describes the input and output netCDF files.
General¶
DUBFI takes model equivalents and observation data as input. Currently, it is only designed to work with in-situ concentration measurements. The output consists of scaling factors for flux categories and uncertainties on these flux categories. DUBFI will not output fluxes or emission estimates.
All data input and output is in netCDF files.
Each numerical input and output field should have a units attribute.
All concentrations are provided in units mole tracer per mole of dry air (mol/mol).
For other quantities, SI units are preferred.
Input¶
Data input consists of multiple netCDF files containing observed concentrations and model predictions (simulation results).
Additionally, when cycling the inversion, an output file of DUBFI will also serve as input.
For the following description, we will assume that the species (configuration entry species) is CH4.
Directory tree¶
Observation time series are distinguished by station and sampling height (ssh). For each ssh, DUBFI requires two input files containing information about the deterministic and ensemble run.
Input files are provided in directory mec_dir or orig_mec_dir, both defined in the configuration.
For an ssh identifier “JFJ_13.9” (station Jungfraujoch, sampling 13.9 m above ground level), the required input files are
${mec_dir}/JFJ_13.9_det.nc (deterministic run) and ${orig_mec_dir}/JFJ_13.9_ens.nc (ensemble run).
If orig_mec_dir is not provided, it defaults to mec_dir.
The separate directory mec_dir allows the user to provide far-field corrected data for JFJ_13.9_det.nc in a separate directory.
Coordinates¶
The netCDF input files shall use the following coordinates:
time: observation time, must be a sorted array of times. All data (model and observation) must be provided on the same time coordinate.bc_prior: multiple boundary conditions may be provided. The configuration entryinput.bc_namedefines the label selected from this dimension.flux_cat: labels flux categories, these should be short but ideally human-readable labels. This must include the label “NordStream”.flux_total: labels components that should be summed up to obtain the total contribution of the fluxes.ensmem: meteorological ensemble member, only present in the ensemble data. Coordinates along this dimension are not used.if configuration entry
input.use_bc_correctionis true (which is the default):bc_ens_letkf: ensemble dimension of the posterior boundary condition ensemble
Common variables¶
obs_CH4 (time): time series of observed CH4 concentration in mol/molobs_stdev_CH4 (time) (optional): standard deviation ofobs_CH4(alternative name:obs_stddev_CH4)windspeed ([ensmem,] time): model wind speed in m/s, used for filteringCH4_flux_total (flux_total, [ensmem,] time): additive components of the total contribution of all fluxes within the domain to the concentration.CH4_bc_prior (bc_prior, [ensmem,] time): contribution of boundary conditions to the observed CH4 concentration. Eachbc_priormust represent one alternative total boundary contribution.
Deterministic run¶
CH4_flux_cat (flux_cat, time): contribution of flux categories to total CH4 concentrationCH4_bc_correction (time): correction that shall be added toCH4_bc_priorbefore the inversion.CH4_bc_ens_letkf (bc_ens_letkf, time): ensemble of boundary conditions, posterior of the far-field correction. Currently, the ensemble mean is ignored.CH4_nordstream_ensmax (time): ensemble maximum of CH4 due to Nord Stream explosion. This is (ideally) based on ensemble data but included in file for deterministic run to simplify data handling.
Standard deviations estimated by the model equivalent calculator (optional). These estimate uncertainties in the interpolation to the observation coordinates:
CH4_flux_cat_stdev (flux_cat, time)CH4_flux_total_stdev (flux_total, time)CH4_bc_prior_stdev (bc_prior, time)
Ensemble run¶
The concentration of each flux category is estimated using the following fields:
group2cat (flux_cat, flux_group) [units=1]CH4_flux_cat_weights (flux_cat, time) [units=1]CH4_flux_group (flux_group, ensmem, time) [units=mol mol-1]
We define:
>>> CH4_flux_cat[i] = CH4_flux_cat_weights[i] * (group2cat[i] @ CH4_flux_group)
This reflects the approximations in the ensemble simulation, see Approximating ensemble members.
Attributes¶
ssh: station and sampling height identifier of the form “ABC_123.4” where “ABC” is the 3-letter station code and “123.4” is the sampling height in meters above ground level.lon,lat: station coordinates in degrees east and degrees northheight in meters must be provided as attribute defined in configuration entry
uncertainty.vertical_coordinate. This coordinate is used for the localization of correlations.ens_size: number of meteorological ensemble membersstation_code(optional): 3-letter station code, should be first three letters ofsshis_auxiliary_to_data_in: This attribute defines a path to a file from which missing fields shall be read. The typical use case is that multiple files with different far-field correction exist, but not all data need to be stored redundantly.
Examples¶
Deterministic:
// "CH4" can be replaced with any other species specified in the configuration.
//
netcdf JFJ_13.9_det {
dimensions:
time = 31909 ; // mandatory, aligned with JFJ_13.9_ens
bc_prior = 2 ; // mandatory
flux_cat = 142 ; // mandatory, aligned with JFJ_13.9_ens
flux_total = 3 ; // mandatory
bc_ens_letkf = 20 ; // optional
//
variables:
// Coordinates
int64 time(time) ; // mandatory
time:standard_name = "time" ;
time:time_zone = "UTC" ;
time:units = "hours since 2019-12-15T00:30:00" ;
time:calendar = "proleptic_gregorian" ;
string bc_prior(bc_prior) ; // mandatory
string flux_cat(flux_cat) ; // mandatory
string flux_total(flux_total) ; // mandatory
string bc_ens_letkf(bc_ens_letkf) ; // optional
//
// Observations
double obs_CH4(time) ; // mandatory
obs_CH4:units = "mol mol-1" ;
double obs_stdev_CH4(time) ; // optional
obs_stdev_CH4:units = "mol mol-1" ;
double obs_CH4_orig(time) ; // optional
obs_CH4_orig:units = "mol mol-1" ;
//
// Simulation results
double CH4_flux_cat(flux_cat, time) ; // mandatory
CH4_flux_cat:units = "mol mol-1" ;
double CH4_flux_total(flux_total, time) ; // mandatory
CH4_flux_total:units = "mol mol-1" ;
CH4_flux_total:long_name = "tracer concentration in mol CH4 per mol dry air" ;
double CH4_flux_total_stdev(flux_total, time) ; // optional
CH4_flux_total_stdev:units = "mol mol-1" ;
CH4_flux_total_stdev:long_name = "interpolation uncertainty of CH4 in mol CH4 per mol dry air" ;
double CH4_bc_prior(bc_prior, time) ; // mandatory
CH4_bc_prior:units = "mol mol-1" ;
CH4_bc_prior:long_name = "tracer concentration in mol CH4 per mol dry air" ;
double CH4_bc_prior_stdev(bc_prior, time) ; // optional
CH4_bc_prior_stdev:units = "mol mol-1" ;
CH4_bc_prior_stdev:long_name = "interpolation uncertainty of CH4 in mol CH4 per mol dry air" ;
//
// Far-field correction
double CH4_bc_correction(time) ; // mandatory if input.use_bc_correction == True in configuration
CH4_bc_correction:units = "mol mol-1" ;
CH4_bc_correction:bc_name = "CamsInvOpt_v24r1_sfcsat" ; // must be in bc_prior and equal to attribute of CH4_bc_ens_letkf
double CH4_bc_ens_letkf(bc_ens_letkf, time) ; // recommended when using far-field correction
CH4_bc_ens_letkf:units = "mol mol-1" ;
CH4_bc_ens_letkf:bc_name = "CamsInvOpt_v24r1_sfcsat" ; // must be in bc_prior and equal to attribute of CH4_bc_correction
//
// Auxiliary data
double CH4_events_ensmax(time) ; // optional
CH4_events_ensmax:units = "mol mol-1" ;
double windspeed(time) ; // mandatory if data_filter.min_wind > 0 in configuration
windspeed:units = "m s-1" ;
// global attributes:
:station_name = "Jungfraujoch" ;
:station_code = "JFJ" ; // mandatory
:lat = 46.5475 ; // mandatory: degrees north
:lon = 7.9851 ; // mandatory: degrees east
:ICOSCP_PID = "https://meta.icos-cp.eu/objects/s1H3W75AL-TzB7nRBjFX9TZ5" ;
//
// Heights are given in meters above ground level (agl) or above sea level (asl).
// Ground level differs between observation (real topography, station_elevation_asl)
// and model (topography on model resolution, model_ground_asl).
:sampling_height_agl = 13.9 ; // mandatory: sampling height above ground level
:station_elevation_asl = 3571.8 ; // station (ground) elevation above sea level
:obs_height_asl = 3585.7 ; // observation height above sea level
:mec_height_agl = 521.368011207554 ; // model sampling height above model ground
:mec_height_asl = 3078.23198879245 ; // model sampling height above sea level
:model_ground_asl = 2556.86397758489 ; // model topography
:ssh = "JFJ_13.9" ; // unique identifier: station code and sampling height
:ens_size = 20LL ; // mandatory: number of meteorological ensemble members
}
Ensemble:
// "CH4" can be replaced with any other species specified in the configuration.
//
netcdf JFJ_13.9_ens {
dimensions:
time = 31909 ; // mandatory, aligned with JFJ_13.9_det
bc_prior = 1 ; // mandatory
flux_cat = 142 ; // mandatory, aligned with JFJ_13.9_det
flux_total = 1 ; // mandatory
flux_group = 14 ; // mandatory
ensmem = 20 ; // mandatory
//
variables:
// Coordinates
int64 time(time) ; // mandatory
time:standard_name = "time" ;
time:time_zone = "UTC" ;
time:units = "hours since 2019-12-15T00:30:00" ;
time:calendar = "proleptic_gregorian" ;
string bc_prior(bc_prior) ; // mandatory
string flux_cat(flux_cat) ; // mandatory
string flux_total(flux_total) ; // mandatory
string flux_group(flux_group) ; // mandatory
string ensmem(ensmem) ; // optional
//
// Observations
double obs_CH4(time) ; // mandatory
obs_CH4:units = "mol mol-1" ;
double obs_stdev_CH4(time) ; // optional
obs_stdev_CH4:units = "mol mol-1" ;
double obs_CH4_orig(time) ; // optional
obs_CH4_orig:units = "mol mol-1" ;
//
// Simulation results
byte group2cat(flux_cat, flux_group) ; // mandatory
group2cat:units = "1" ;
double CH4_flux_cat_weights(flux_cat, time) ; // mandatory
CH4_flux_cat_weights:units = "1" ;
double CH4_flux_group(flux_group, ensmem, time) ; // mandatory
CH4_flux_group:units = "mol mol-1" ;
double CH4_flux_total(flux_total, ensmem, time) ; // mandatory
CH4_flux_total:units = "mol mol-1" ;
CH4_flux_total:long_name = "tracer concentration in mol CH4 per mol dry air" ;
double CH4_bc_prior(bc_prior, ensmem, time) ; // mandatory
CH4_bc_prior:units = "mol mol-1" ;
CH4_bc_prior:long_name = "tracer concentration in mol CH4 per mol dry air" ;
//
// Auxiliary data
double windspeed(ensmem, time) ; // optional
windspeed:units = "m s-1" ;
// global attributes:
:station_name = "Jungfraujoch" ;
:station_code = "JFJ" ; // mandatory
:lat = 46.5475 ; // mandatory: degrees north
:lon = 7.9851 ; // mandatory: degrees east
:ICOSCP_PID = "https://meta.icos-cp.eu/objects/s1H3W75AL-TzB7nRBjFX9TZ5" ;
//
// Heights are given in meters above ground level (agl) or above sea level (asl).
// Ground level differs between observation (real topography, station_elevation_asl)
// and model (topography on model resolution, model_ground_asl).
:sampling_height_agl = 13.9 ; // mandatory: sampling height above ground level
:station_elevation_asl = 3571.8 ; // station (ground) elevation above sea level
:obs_height_asl = 3585.7 ; // observation height above sea level
:mec_height_agl = 521.368011207554 ; // model sampling height above model ground
:mec_height_asl = 3078.23198879245 ; // model sampling height above sea level
:model_ground_asl = 2556.86397758489 ; // model topography
:ssh = "JFJ_13.9" ; // unique identifier: station code and sampling height
:how_to_CH4_flux_cat = "Use the approximation: CH4_flux_cat = (CH4_flux_group @ group2cat) * CH4_flux_cat_weights" ;
}
Output¶
The output of DUBFI mainly consists of scaling factors for flux categories, and the error covariance matrices for these scaling factors. The vectors of scaling factors form the state space of the inversion.
Additionally, the output contains the sensitivity of scaling factors to observations as a connection between state space and observation space.
The output is based on the internal structure of the observation space, which is a one-dimensional vector without definite ordering.
For post-processing, it may be of interest to reproduce the observation filtering done in the inversion using dubfi.fluxes.readobs.data_from_config().
DUBFI does not know about the fluxes and does not provide flux estimates. The post-processing package used at DWD is not yet published.
Directory tree¶
The output directory specified for dubfi.fluxes will contain a file “inversion_result.nc”, a configuration file “config.yml”, and logfiles (when using the default configuration).
Coordinates¶
flux_cat: flux category (as in input)flux_cat_dual: flux category, equivalent toflux_cat, used for square matricesnorm_prefactor: prefactor of normalization term in cost function in Bayesian inversion (as in configuration entryinversion.norm_prefactor)obs: observation dimension combining time and ssh. All observation time series are combined in a long vector. Sorting along time time or station is not guaranteed.ssh: station and sampling height identifiers (strings)obs_time (obs): observation timessh_idx (obs): 0-based index of ssh identifier along observation dimension. Observation at indexihas observation timeobs_time[i]and ssh identifierssh[ssh_idx[i]].segment (optional): segment in MPI parallelization
Data variables¶
Output data variables are described by dubfi.fluxes.core.OUTPUT_METADATA, which might be more up to date than this documentation.
raw_config_utf8: UTF-8 encoded YAML configuration defining inversion parameterscost_function_post: posterior inversion cost functions_prior_kalman: Prior scaling factors in cycling with constant R. This is only present when using cycling. Values equal zero indicate that s_prior describes the deviation from the prior emissions.s_prior: Prior scaling factors. When cycling, this depends on the norm prefactor. Values equal zero indicate that s_post describes the deviation from the prior emissions.s_post: posterior scaling factorss_post_kalman: Posterior scaling factors, assuming constant R. To be understood relative to s_prior. In these posterior scaling factors, the dependence of the uncertainty on the scaling factors was neglected.b_prior: uncertainty (error covariance) matrix of s_priorb_post: uncertainty (error covariance) matrix of s_postb_post_kalman: uncertainty (error covariance) matrix of s_post_kalmansensitivity: Linearized sensitivity of posterior scaling factors to observations. This is the derivative of posterior scaling factors w.r.t. observations.sensitivity_kalman: Linearized sensitivity of posterior scaling factors to observations, assuming constant R. This is the derivative of s_post_kalman w.r.t. observations.averaging_kernel: Averaging kernel estimate. This is the derivative of posterior scaling factors (dimension flux_cat_dual) w.r.t. true scaling factors (dimension flux_cat), estimated under assumption of a perfect transport model.mdm_prior: observation minus prior model prediction (for scaling factors s_prior)mdm_post: observation minus posterior model prediction (for scaling factors s_post)mdm_post_kalman: observation minus model prediction for scaling factors s_post_kalmanmdm_stdev_prior: standard deviation assumed in inversion for mdm_prior (same for mdm_post_kalman), including uncertainty weighting and inflationmdm_stdev_post: standard deviation assumed in inversion for mdm_post, including uncertainty weighting and inflationssh: station code and sampling heightssh_idx: 0-based index of ssh coordinate for observation data pointsobs_time: observation time (UTC)lon: station longitude (degrees east)lat: station latitude (degrees north)height: observation height coordinate as used for localizationflux_cat: flux category namesegment_size: unbuffered size of segments in MPI parallelizationbuffered_segment_size: buffered size of segments in MPI parallelizationobs_count: number of observations per station and sampling heightnorm_prefactor: prefactor of normalization term in cost functionsolver_nit: number of solver iterationssolver_nfev: number of cost function calls in solversolver_njev: number of calls to gradient of cost function in solversolver_nhev: number of calls to Hessian of cost function in solversolver_status: solver status, 0 means success
Attributes¶
start_window: start of inversion time window, ISO-formatted date and time stringend_window: end of inversion time window, ISO-formatted date and time stringchi2: \(\chi^2\) value of the fit, can be used combined with attributeddofto estimate agreement of the assumed uncertainties with the true deviationnext_norm_prefactor_idx: integer reporting the progress in filling an existing file with data. In a complete file, this must be equal to the size of the dimensionnorm_prefactor.