Data Interfaces

This page describes the input and output netCDF files.

General

DUBFI takes model equivalents and observation data as input. Currently, it is only designed to work with in-situ concentration measurements. The output consists of scaling factors for flux categories and uncertainties on these flux categories. DUBFI will not output fluxes or emission estimates.

All data input and output is in netCDF files. Each numerical input and output field should have a units attribute. All concentrations are provided in units mole tracer per mole of dry air (mol/mol). For other quantities, SI units are preferred.

Input

Data input consists of multiple netCDF files containing observed concentrations and model predictions (simulation results). Additionally, when cycling the inversion, an output file of DUBFI will also serve as input. For the following description, we will assume that the species (configuration entry species) is CH4.

Directory tree

Observation time series are distinguished by station and sampling height (ssh). For each ssh, DUBFI requires two input files containing information about the deterministic and ensemble run.

Input files are provided in directory mec_dir or orig_mec_dir, both defined in the configuration. For an ssh identifier “JFJ_13.9” (station Jungfraujoch, sampling 13.9 m above ground level), the required input files are ${mec_dir}/JFJ_13.9_det.nc (deterministic run) and ${orig_mec_dir}/JFJ_13.9_ens.nc (ensemble run).

If orig_mec_dir is not provided, it defaults to mec_dir. The separate directory mec_dir allows the user to provide far-field corrected data for JFJ_13.9_det.nc in a separate directory.

Coordinates

The netCDF input files shall use the following coordinates:

  • time: observation time, must be a sorted array of times. All data (model and observation) must be provided on the same time coordinate.

  • bc_prior: multiple boundary conditions may be provided. The configuration entry input.bc_name defines the label selected from this dimension.

  • flux_cat: labels flux categories, these should be short but ideally human-readable labels. This must include the label “NordStream”.

  • flux_total: labels components that should be summed up to obtain the total contribution of the fluxes.

  • ensmem: meteorological ensemble member, only present in the ensemble data. Coordinates along this dimension are not used.

  • if configuration entry input.use_bc_correction is true (which is the default):

    • bc_ens_letkf: ensemble dimension of the posterior boundary condition ensemble

Common variables

  • obs_CH4 (time): time series of observed CH4 concentration in mol/mol

  • obs_stdev_CH4 (time) (optional): standard deviation of obs_CH4 (alternative name: obs_stddev_CH4)

  • windspeed ([ensmem,] time): model wind speed in m/s, used for filtering

  • CH4_flux_total (flux_total, [ensmem,] time): additive components of the total contribution of all fluxes within the domain to the concentration.

  • CH4_bc_prior (bc_prior, [ensmem,] time): contribution of boundary conditions to the observed CH4 concentration. Each bc_prior must represent one alternative total boundary contribution.

Deterministic run

  • CH4_flux_cat (flux_cat, time): contribution of flux categories to total CH4 concentration

  • CH4_bc_correction (time): correction that shall be added to CH4_bc_prior before the inversion.

  • CH4_bc_ens_letkf (bc_ens_letkf, time): ensemble of boundary conditions, posterior of the far-field correction. Currently, the ensemble mean is ignored.

  • CH4_nordstream_ensmax (time): ensemble maximum of CH4 due to Nord Stream explosion. This is (ideally) based on ensemble data but included in file for deterministic run to simplify data handling.

Standard deviations estimated by the model equivalent calculator (optional). These estimate uncertainties in the interpolation to the observation coordinates:

  • CH4_flux_cat_stdev (flux_cat, time)

  • CH4_flux_total_stdev (flux_total, time)

  • CH4_bc_prior_stdev (bc_prior, time)

Ensemble run

The concentration of each flux category is estimated using the following fields:

  • group2cat (flux_cat, flux_group) [units=1]

  • CH4_flux_cat_weights (flux_cat, time) [units=1]

  • CH4_flux_group (flux_group, ensmem, time) [units=mol mol-1]

We define:

>>> CH4_flux_cat[i] = CH4_flux_cat_weights[i] * (group2cat[i] @ CH4_flux_group)

This reflects the approximations in the ensemble simulation, see Approximating ensemble members.

Attributes

  • ssh: station and sampling height identifier of the form “ABC_123.4” where “ABC” is the 3-letter station code and “123.4” is the sampling height in meters above ground level.

  • lon, lat: station coordinates in degrees east and degrees north

  • height in meters must be provided as attribute defined in configuration entry uncertainty.vertical_coordinate. This coordinate is used for the localization of correlations.

  • ens_size: number of meteorological ensemble members

  • station_code (optional): 3-letter station code, should be first three letters of ssh

  • is_auxiliary_to_data_in: This attribute defines a path to a file from which missing fields shall be read. The typical use case is that multiple files with different far-field correction exist, but not all data need to be stored redundantly.

Examples

Deterministic:

//  "CH4" can be replaced with any other species specified in the configuration.
//
netcdf JFJ_13.9_det {
dimensions:
        time = 31909 ;       // mandatory, aligned with JFJ_13.9_ens
        bc_prior = 2 ;       // mandatory
        flux_cat = 142 ;     // mandatory, aligned with JFJ_13.9_ens
        flux_total = 3 ;     // mandatory
        bc_ens_letkf = 20 ;  // optional
        //
variables:
        // Coordinates
        int64 time(time) ;  // mandatory
                time:standard_name = "time" ;
                time:time_zone = "UTC" ;
                time:units = "hours since 2019-12-15T00:30:00" ;
                time:calendar = "proleptic_gregorian" ;
        string bc_prior(bc_prior) ;  // mandatory
        string flux_cat(flux_cat) ;  // mandatory
        string flux_total(flux_total) ;  // mandatory
        string bc_ens_letkf(bc_ens_letkf) ;  // optional
        //
        // Observations
        double obs_CH4(time) ;  // mandatory
                obs_CH4:units = "mol mol-1" ;
        double obs_stdev_CH4(time) ;  // optional
                obs_stdev_CH4:units = "mol mol-1" ;
        double obs_CH4_orig(time) ;  // optional
                obs_CH4_orig:units = "mol mol-1" ;
        //
        // Simulation results
        double CH4_flux_cat(flux_cat, time) ;  // mandatory
                CH4_flux_cat:units = "mol mol-1" ;
        double CH4_flux_total(flux_total, time) ;  // mandatory
                CH4_flux_total:units = "mol mol-1" ;
                CH4_flux_total:long_name = "tracer concentration in mol CH4 per mol dry air" ;
        double CH4_flux_total_stdev(flux_total, time) ;  // optional
                CH4_flux_total_stdev:units = "mol mol-1" ;
                CH4_flux_total_stdev:long_name = "interpolation uncertainty of CH4 in mol CH4 per mol dry air" ;
        double CH4_bc_prior(bc_prior, time) ;  // mandatory
                CH4_bc_prior:units = "mol mol-1" ;
                CH4_bc_prior:long_name = "tracer concentration in mol CH4 per mol dry air" ;
        double CH4_bc_prior_stdev(bc_prior, time) ;  // optional
                CH4_bc_prior_stdev:units = "mol mol-1" ;
                CH4_bc_prior_stdev:long_name = "interpolation uncertainty of CH4 in mol CH4 per mol dry air" ;
        //
        // Far-field correction
        double CH4_bc_correction(time) ;  // mandatory if input.use_bc_correction == True in configuration
                CH4_bc_correction:units = "mol mol-1" ;
                CH4_bc_correction:bc_name = "CamsInvOpt_v24r1_sfcsat" ;  // must be in bc_prior and equal to attribute of CH4_bc_ens_letkf
        double CH4_bc_ens_letkf(bc_ens_letkf, time) ;  // recommended when using far-field correction
                CH4_bc_ens_letkf:units = "mol mol-1" ;
                CH4_bc_ens_letkf:bc_name = "CamsInvOpt_v24r1_sfcsat" ;  // must be in bc_prior and equal to attribute of CH4_bc_correction
        //
        // Auxiliary data
        double CH4_events_ensmax(time) ;  // optional
                CH4_events_ensmax:units = "mol mol-1" ;
        double windspeed(time) ;  // mandatory if data_filter.min_wind > 0 in configuration
                windspeed:units = "m s-1" ;

// global attributes:
                :station_name = "Jungfraujoch" ;
                :station_code = "JFJ" ;  // mandatory
                :lat = 46.5475 ;  // mandatory: degrees north
                :lon = 7.9851 ;  // mandatory: degrees east
                :ICOSCP_PID = "https://meta.icos-cp.eu/objects/s1H3W75AL-TzB7nRBjFX9TZ5" ;
                //
                // Heights are given in meters above ground level (agl) or above sea level (asl).
                // Ground level differs between observation (real topography, station_elevation_asl)
                // and model (topography on model resolution, model_ground_asl).
                :sampling_height_agl = 13.9 ;  // mandatory: sampling height above ground level
                :station_elevation_asl = 3571.8 ;  // station (ground) elevation above sea level
                :obs_height_asl = 3585.7 ;  // observation height above sea level
                :mec_height_agl = 521.368011207554 ;  // model sampling height above model ground
                :mec_height_asl = 3078.23198879245 ;  // model sampling height above sea level
                :model_ground_asl = 2556.86397758489 ;  // model topography
                :ssh = "JFJ_13.9" ;  // unique identifier: station code and sampling height
                :ens_size = 20LL ;  // mandatory: number of meteorological ensemble members

}

Ensemble:

//  "CH4" can be replaced with any other species specified in the configuration.
//
netcdf JFJ_13.9_ens {
dimensions:
        time = 31909 ;     // mandatory, aligned with JFJ_13.9_det
        bc_prior = 1 ;     // mandatory
        flux_cat = 142 ;   // mandatory, aligned with JFJ_13.9_det
        flux_total = 1 ;   // mandatory
        flux_group = 14 ;  // mandatory
        ensmem = 20 ;      // mandatory
        //
variables:
        // Coordinates
        int64 time(time) ;  // mandatory
                time:standard_name = "time" ;
                time:time_zone = "UTC" ;
                time:units = "hours since 2019-12-15T00:30:00" ;
                time:calendar = "proleptic_gregorian" ;
        string bc_prior(bc_prior) ;  // mandatory
        string flux_cat(flux_cat) ;  // mandatory
        string flux_total(flux_total) ;  // mandatory
        string flux_group(flux_group) ;  // mandatory
        string ensmem(ensmem) ;  // optional
        //
        // Observations
        double obs_CH4(time) ;  // mandatory
                obs_CH4:units = "mol mol-1" ;
        double obs_stdev_CH4(time) ;  // optional
                obs_stdev_CH4:units = "mol mol-1" ;
        double obs_CH4_orig(time) ;  // optional
                obs_CH4_orig:units = "mol mol-1" ;
        //
        // Simulation results
        byte group2cat(flux_cat, flux_group) ;  // mandatory
                group2cat:units = "1" ;
        double CH4_flux_cat_weights(flux_cat, time) ;  // mandatory
                CH4_flux_cat_weights:units = "1" ;
        double CH4_flux_group(flux_group, ensmem, time) ;  // mandatory
                CH4_flux_group:units = "mol mol-1" ;
        double CH4_flux_total(flux_total, ensmem, time) ;  // mandatory
                CH4_flux_total:units = "mol mol-1" ;
                CH4_flux_total:long_name = "tracer concentration in mol CH4 per mol dry air" ;
        double CH4_bc_prior(bc_prior, ensmem, time) ;  // mandatory
                CH4_bc_prior:units = "mol mol-1" ;
                CH4_bc_prior:long_name = "tracer concentration in mol CH4 per mol dry air" ;
        //
        // Auxiliary data
        double windspeed(ensmem, time) ;  // optional
                windspeed:units = "m s-1" ;

// global attributes:
                :station_name = "Jungfraujoch" ;
                :station_code = "JFJ" ;  // mandatory
                :lat = 46.5475 ;  // mandatory: degrees north
                :lon = 7.9851 ;  // mandatory: degrees east
                :ICOSCP_PID = "https://meta.icos-cp.eu/objects/s1H3W75AL-TzB7nRBjFX9TZ5" ;
                //
                // Heights are given in meters above ground level (agl) or above sea level (asl).
                // Ground level differs between observation (real topography, station_elevation_asl)
                // and model (topography on model resolution, model_ground_asl).
                :sampling_height_agl = 13.9 ;  // mandatory: sampling height above ground level
                :station_elevation_asl = 3571.8 ;  // station (ground) elevation above sea level
                :obs_height_asl = 3585.7 ;  // observation height above sea level
                :mec_height_agl = 521.368011207554 ;  // model sampling height above model ground
                :mec_height_asl = 3078.23198879245 ;  // model sampling height above sea level
                :model_ground_asl = 2556.86397758489 ;  // model topography
                :ssh = "JFJ_13.9" ;  // unique identifier: station code and sampling height
                :how_to_CH4_flux_cat = "Use the approximation: CH4_flux_cat = (CH4_flux_group @ group2cat) * CH4_flux_cat_weights" ;

}

Output

The output of DUBFI mainly consists of scaling factors for flux categories, and the error covariance matrices for these scaling factors. The vectors of scaling factors form the state space of the inversion.

Additionally, the output contains the sensitivity of scaling factors to observations as a connection between state space and observation space. The output is based on the internal structure of the observation space, which is a one-dimensional vector without definite ordering. For post-processing, it may be of interest to reproduce the observation filtering done in the inversion using dubfi.fluxes.readobs.data_from_config().

DUBFI does not know about the fluxes and does not provide flux estimates. The post-processing package used at DWD is not yet published.

Directory tree

The output directory specified for dubfi.fluxes will contain a file “inversion_result.nc”, a configuration file “config.yml”, and logfiles (when using the default configuration).

Coordinates

  • flux_cat: flux category (as in input)

  • flux_cat_dual: flux category, equivalent to flux_cat, used for square matrices

  • norm_prefactor: prefactor of normalization term in cost function in Bayesian inversion (as in configuration entry inversion.norm_prefactor)

  • obs: observation dimension combining time and ssh. All observation time series are combined in a long vector. Sorting along time time or station is not guaranteed.

  • ssh: station and sampling height identifiers (strings)

  • obs_time (obs): observation time

  • ssh_idx (obs): 0-based index of ssh identifier along observation dimension. Observation at index i has observation time obs_time[i] and ssh identifier ssh[ssh_idx[i]].

  • segment (optional): segment in MPI parallelization

Data variables

Output data variables are described by dubfi.fluxes.core.OUTPUT_METADATA, which might be more up to date than this documentation.

  • raw_config_utf8: UTF-8 encoded YAML configuration defining inversion parameters

  • cost_function_post: posterior inversion cost function

  • s_prior_kalman: Prior scaling factors in cycling with constant R. This is only present when using cycling. Values equal zero indicate that s_prior describes the deviation from the prior emissions.

  • s_prior: Prior scaling factors. When cycling, this depends on the norm prefactor. Values equal zero indicate that s_post describes the deviation from the prior emissions.

  • s_post: posterior scaling factors

  • s_post_kalman: Posterior scaling factors, assuming constant R. To be understood relative to s_prior. In these posterior scaling factors, the dependence of the uncertainty on the scaling factors was neglected.

  • b_prior: uncertainty (error covariance) matrix of s_prior

  • b_post: uncertainty (error covariance) matrix of s_post

  • b_post_kalman: uncertainty (error covariance) matrix of s_post_kalman

  • sensitivity: Linearized sensitivity of posterior scaling factors to observations. This is the derivative of posterior scaling factors w.r.t. observations.

  • sensitivity_kalman: Linearized sensitivity of posterior scaling factors to observations, assuming constant R. This is the derivative of s_post_kalman w.r.t. observations.

  • averaging_kernel: Averaging kernel estimate. This is the derivative of posterior scaling factors (dimension flux_cat_dual) w.r.t. true scaling factors (dimension flux_cat), estimated under assumption of a perfect transport model.

  • mdm_prior: observation minus prior model prediction (for scaling factors s_prior)

  • mdm_post: observation minus posterior model prediction (for scaling factors s_post)

  • mdm_post_kalman: observation minus model prediction for scaling factors s_post_kalman

  • mdm_stdev_prior: standard deviation assumed in inversion for mdm_prior (same for mdm_post_kalman), including uncertainty weighting and inflation

  • mdm_stdev_post: standard deviation assumed in inversion for mdm_post, including uncertainty weighting and inflation

  • ssh: station code and sampling height

  • ssh_idx: 0-based index of ssh coordinate for observation data points

  • obs_time: observation time (UTC)

  • lon: station longitude (degrees east)

  • lat: station latitude (degrees north)

  • height: observation height coordinate as used for localization

  • flux_cat: flux category name

  • segment_size: unbuffered size of segments in MPI parallelization

  • buffered_segment_size: buffered size of segments in MPI parallelization

  • obs_count: number of observations per station and sampling height

  • norm_prefactor: prefactor of normalization term in cost function

  • solver_nit: number of solver iterations

  • solver_nfev: number of cost function calls in solver

  • solver_njev: number of calls to gradient of cost function in solver

  • solver_nhev: number of calls to Hessian of cost function in solver

  • solver_status: solver status, 0 means success

Attributes

  • start_window: start of inversion time window, ISO-formatted date and time string

  • end_window: end of inversion time window, ISO-formatted date and time string

  • chi2: \(\chi^2\) value of the fit, can be used combined with attribute ddof to estimate agreement of the assumed uncertainties with the true deviation

  • next_norm_prefactor_idx: integer reporting the progress in filling an existing file with data. In a complete file, this must be equal to the size of the dimension norm_prefactor.