dubfi.fluxes.readobs¶
Read and filter observations as defined in configuration.
Added in version 0.1.0: initial release
Changed in version 0.1.1.
Attributes¶
List of flags indicating filtering of observations for the inversion. |
Classes¶
Object that parses configuration and reads data. |
Functions¶
|
Get integer representation of inversion flag name, see |
|
Gather coordinates from input data, applying filtering as defined in config. |
|
Average data array over rolling time window. |
|
Provide filtered data as defined in configuration. |
Module Contents¶
- dubfi.fluxes.readobs.INVERSION_FLAGS = ['used', 'ignored: time range', 'ignored: season', 'ignored: time of day', 'ignored: wind...¶
List of flags indicating filtering of observations for the inversion.
- dubfi.fluxes.readobs.get_flag(key)¶
Get integer representation of inversion flag name, see
INVERSION_FLAGS.Added in version 0.1.1.
- Parameters:
key (str)
- Return type:
int
- dubfi.fluxes.readobs.coordinates_from_config(config, *, nprocesses=None, **kwargs)¶
Gather coordinates from input data, applying filtering as defined in config.
- Parameters:
config (str | dict) – configuration of path to configuration file (YAML)
nprocesses (int, optional) – number of parallel processes, default taken from configuration.
**kwargs (any) – passed on to
data_from_config()
- Returns:
coordinates – dictionary with entries:
ssh: array of station and sampling height identifiers
time: list of time arrays, aligned with ssh
lon: array of station longitudes (degrees), aligned with ssh
lat: array of station latitudes (degrees), aligned with ssh
height: array of station heights (meters), aligned with ssh
flux_cat: array of flux category names
ens_size: number of meteorological ensemble members (integer, only returned if included in data)
- Return type:
dict
- dubfi.fluxes.readobs.average_window(da, window, min_diff=np.timedelta64(1, 'h'))¶
Average data array over rolling time window.
- Parameters:
da (xr.DataArray) – data array that shall be averaged. This must have a dimension and coordinate “time” of dtype np.datetime64. This time coordinate must be sorted and the minimum distance between the coordinate values must be at least min_diff.
window (np.timedelta64) – time window for averaging. result[i] is the mean of da[j] for all j such that abs(da.time[i] - da.time[j]) < window.
min_diff (np.timedelta64, default=1h) – minimum distance between coordinate values in da.time. If the value is too large, results will be wrong. If the provided value is too small, performance will be worse.
- Returns:
avg_da – da averaged over rolling time window. All dimensions and coordinates will be the same as in da. Internal order of data in memory may differ from da.
- Return type:
xr.DataArray
- class dubfi.fluxes.readobs.ReadObs(config, suffix_rx)¶
Object that parses configuration and reads data.
Added in version 0.1.1.
Provide filtered data as defined in configuration.
- Parameters:
config (str | dict) – configuration of path to configuration file (YAML)
suffix_rx (str, default=r"_det.nc$") – regular expression for the input file suffix, use this to select the determinstic run without (_det.nc) or with (_det_letkf.nc) far-field correction, or the ensemble data (_ens.nc).
- get_data(coordinates_only=False, return_flags=False)¶
Provide filtered data as defined in configuration.
- Parameters:
coordinates_only (bool, default=False) – if true, return only the coordinates and drop all other data
return_flags (bool, default=False) – additionally return a time series of flags defining why which data point was used or not used in the inversion. In this case, results will not be filtered.
- Yields:
xr.Dataset – datasets for each matching station and sampling height. Files are sorted alphabetically. Data are filtered unless return_flags is true.
xr.DataArray – only if return_flags: flag for each observation data point
- Return type:
Generator[xarray.Dataset | tuple[xarray.Dataset, xarray.DataArray], None, None]
- process_file(file, coordinates_only, return_flags)¶
Load and filter data from file, ready for dask delayed.
- Parameters:
file (str)
coordinates_only (bool)
return_flags (bool)
- get_data_parallel(coordinates_only=False, return_flags=False, nprocesses=1)¶
Provide filtered data as defined in configuration.
- Parameters:
coordinates_only (bool, default=False) – if true, return only the coordinates and drop all other data
return_flags (bool, default=False) – additionally return a time series of flags defining why which data point was used or not used in the inversion. In this case, results will not be filtered.
nprocesses (int, default=1) – number of parallel worker processes
- Returns:
if return_flags – list[tuple[xr.Dataset, xr.DataArray]]
else – list[xr.Dataset]
where –
- xr.Dataset
datasets for each matching station and sampling height. Files are sorted alphabetically. Data are filtered unless return_flags is true.
- xr.DataArray
only if return_flags: flag for each observation data point
- Return type:
list[xarray.Dataset | tuple[xarray.Dataset, xarray.DataArray]]
- filter_ds(ds, coordinates_only=False, return_flags=False)¶
Filter data in dataset, see
get_data().- Parameters:
ds (xarray.Dataset)
coordinates_only (bool)
return_flags (bool)
- Return type:
xarray.Dataset | tuple[xarray.Dataset, xarray.DataArray] | None
- dubfi.fluxes.readobs.data_from_config(config, coordinates_only=False, suffix_rx='_det\\.nc$', return_flags=False, nprocesses=None)¶
Provide filtered data as defined in configuration.
- Parameters:
config (str | dict) – configuration of path to configuration file (YAML)
coordinates_only (bool, default=False) – if true, return only the coordinates and drop all other data
suffix_rx (str, default="_det\.nc$") – regular expression for the input file suffix, use this to select the determinstic run without (_det.nc) or with (_det_letkf.nc) far-field correction, or the ensemble data (_ens.nc).
return_flags (bool, default=False) – additionally return a time series of flags defining why which data point was used or not used in the inversion
nprocesses (int, optional) –
number of parallel processes, default taken from configuration. Note that when using parallel processes (nprocesses > 1), all data will be read and returned as a list. Otherwise, this create a generator that yields the datasets.
Warning
Parallel reading of data may require significantly more memory.
- Yields:
xr.Dataset – filtered datasets for each matching station and sampling height. Files are sorted alphabetically.
xr.DataArray – only if return_flags: flag for each observation data point
- Return type:
Iterable[xarray.Dataset | tuple[xarray.Dataset, xarray.DataArray]]
Changed in version 0.1.1.