miranda.convert package#

Data Conversion module.

miranda.convert.aggregate(ds: Dataset, freq: str = 'day') → dict[str, xarray.core.dataset.Dataset][source]#

Parameters:

ds (xarray.Dataset)
freq (str)

Returns:

dict[str, xarray.Dataset]

miranda.convert.aggregations_possible(ds: Dataset, freq: str = 'day') → dict[str, set[str]][source]#

Determine which aggregations are possible based on variables within a dataset.

Parameters:

ds (xarray.Dataset)
freq (str)

Returns:

dict[str, set[str]]

Convert an existing Xarray-compatible dataset to another format with variable corrections applied.

Parameters:

input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with xarray.open_mfdataset() and concatenate files.
project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float or bool) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: False.
regrid (bool) – Performing regridding with xesmf. Default: False.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.

Returns:

xr.Dataset or xr.DataArray

miranda.convert.dataset_corrections(ds: Dataset, project: str) → Dataset[source]#: Convert variables to CF-compliant format

miranda.convert.dims_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Rename dimensions to CF to their equivalents.

Parameters:

d (xarray.Dataset) – Dataset with dimensions to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert.gather_agcfsr(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather agCFSR source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.gather_agmerra(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather agMERRA source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) → dict[str, list[pathlib.Path]][source]#

Parameters:

project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)

Returns:

dict[str, list[pathlib.Path]]

miranda.convert.gather_emdna(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw EMDNA files for preprocessing.

Put all files with the same member together.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.gather_grnch(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw ETS-GRNCH files for preprocessing.

Parameters:: path (str or os.PathLike)
Returns:: dict(str, dict(str, list[Path])) or None

miranda.convert.gather_nex(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw NEX files for preprocessing.

Put all files that should be contained in one dataset in one entry of the dictionary.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.gather_nrcan_gridded_obs(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather NRCan Gridded Observations source data.

Parameters:: path (str or os.PathLike)
Returns:: dict(str, list[pathlib.Path])

miranda.convert.gather_raw_rdrs_by_years(path: str | PathLike) → dict[str, dict[str, list[pathlib.Path]]][source]#

Gather raw RDRS files for preprocessing.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, dict[str, list[pathlib.Path]]

miranda.convert.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) → dict[str, dict[str, list[pathlib.Path]]][source]#

Gather RDRS processed source data.

Parameters:

name (str)
path (str or os.PathLike)
suffix (str)
key ({“raw”, “cf”}) – Indicating which variable name dictionary to search for.

Returns:

dict[str, list[pathlib.Path]]

miranda.convert.gather_sc_earth(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather SC-Earth source data

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.gather_wfdei_gem_capa(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather WFDEI-GEM-CaPa source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert.load_json_data_mappings(project: str) → dict[str, Any][source]#

Load JSON mappings for supported dataset conversions.

Parameters:: project (str)
Returns:: dict[str, Any]

miranda.convert.metadata_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Update xarray dataset and data_vars with project-specific metadata fields.

Parameters:

d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) → Dataset | DataArray[source]#

Land-Sea mask operations.

Parameters:

ds (xr.Dataset or str or os.PathLike)
mask (xr.Dataset or xr.DataArray)
mask_cutoff (float or bool)

Returns:

xr.Dataset or xr.DataArray

miranda.convert.variable_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Add variable metadata and remove nonstandard entries.

Parameters:

d (xarray.Dataset) – Dataset with variable(s) to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

Submodules#

miranda.convert._aggregation module#

Aggregation module.

miranda.convert._aggregation.aggregate(ds: Dataset, freq: str = 'day') → dict[str, xarray.core.dataset.Dataset][source]#

Parameters:

ds (xarray.Dataset)
freq (str)

Returns:

dict[str, xarray.Dataset]

miranda.convert._aggregation.aggregations_possible(ds: Dataset, freq: str = 'day') → dict[str, set[str]][source]#

Determine which aggregations are possible based on variables within a dataset.

Parameters:

ds (xarray.Dataset)
freq (str)

Returns:

dict[str, set[str]]

miranda.convert._data_corrections module#

Convert an existing Xarray-compatible dataset to another format with variable corrections applied.

Parameters:

input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with xarray.open_mfdataset() and concatenate files.
project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float or bool) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: False.
regrid (bool) – Performing regridding with xesmf. Default: False.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.

Returns:

xr.Dataset or xr.DataArray

miranda.convert._data_corrections.dataset_corrections(ds: Dataset, project: str) → Dataset[source]#: Convert variables to CF-compliant format

miranda.convert._data_corrections.dims_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Rename dimensions to CF to their equivalents.

Parameters:

d (xarray.Dataset) – Dataset with dimensions to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert._data_corrections.load_json_data_mappings(project: str) → dict[str, Any][source]#

Load JSON mappings for supported dataset conversions.

Parameters:: project (str)
Returns:: dict[str, Any]

miranda.convert._data_corrections.metadata_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Update xarray dataset and data_vars with project-specific metadata fields.

Parameters:

d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert._data_corrections.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) → Dataset | DataArray[source]#

Land-Sea mask operations.

Parameters:

ds (xr.Dataset or str or os.PathLike)
mask (xr.Dataset or xr.DataArray)
mask_cutoff (float or bool)

Returns:

xr.Dataset or xr.DataArray

miranda.convert._data_corrections.variable_conversion(d: Dataset, p: str, m: dict) → Dataset[source]#

Add variable metadata and remove nonstandard entries.

Parameters:

d (xarray.Dataset) – Dataset with variable(s) to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert._data_definitions module#

miranda.convert._data_definitions.gather_agcfsr(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather agCFSR source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_agmerra(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather agMERRA source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) → dict[str, list[pathlib.Path]][source]#

Parameters:

project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)

Returns:

dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_emdna(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw EMDNA files for preprocessing.

Put all files with the same member together.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_grnch(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw ETS-GRNCH files for preprocessing.

Parameters:: path (str or os.PathLike)
Returns:: dict(str, dict(str, list[Path])) or None

miranda.convert._data_definitions.gather_nex(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather raw NEX files for preprocessing.

Put all files that should be contained in one dataset in one entry of the dictionary.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_nrcan_gridded_obs(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather NRCan Gridded Observations source data.

Parameters:: path (str or os.PathLike)
Returns:: dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_raw_rdrs_by_years(path: str | PathLike) → dict[str, dict[str, list[pathlib.Path]]][source]#

Gather raw RDRS files for preprocessing.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) → dict[str, dict[str, list[pathlib.Path]]][source]#

Gather RDRS processed source data.

Parameters:

name (str)
path (str or os.PathLike)
suffix (str)
key ({“raw”, “cf”}) – Indicating which variable name dictionary to search for.

Returns:

dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_sc_earth(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather SC-Earth source data

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._data_definitions.gather_wfdei_gem_capa(path: str | PathLike) → dict[str, list[pathlib.Path]][source]#

Gather WFDEI-GEM-CaPa source data.

Parameters:: path (str or os.PathLike)
Returns:: dict[str, list[pathlib.Path]]

miranda.convert._reconstruction module#

miranda.convert._reconstruction.reanalysis_processing(data: dict[str, list[str | os.PathLike]], output_folder: str | PathLike, variables: Sequence[str], aggregate: str | bool = False, domains: str | list[str] = '_DEFAULT', start: str | None = None, end: str | None = None, target_chunks: dict | None = None, output_format: str = 'netcdf', overwrite: bool = False, engine: str = 'h5netcdf', n_workers: int = 4, **dask_kwargs) → None[source]#

Parameters:

data (dict[str, list[str]])
output_folder (str or os.PathLike)
variables (Sequence[str])
aggregate ({“day”, None})
domains ({“QC”, “CAN”, “AMNO”, “NAM”, “GLOBAL”})
start (str, optional)
end (str, optional)
target_chunks (dict, optional)
output_format ({“netcdf”, “zarr”})
overwrite (bool)
engine ({“netcdf4”, “h5netcdf”})
n_workers (int)

Returns:

None

miranda.convert.deh module#

DEH Hydrograph Conversion module.

miranda.convert.deh.open_txt(path: str | Path, cf_table: dict | None = {'flag': {'comment': 'See DEH technical information for details.', 'long_name': 'data flag'}, 'q': {'long_name': 'River discharge', 'units': 'm3 s-1'}}) → Dataset[source]#: Extract daily HQ meteorological data and convert to xr.DataArray with CF-Convention attributes.

miranda.convert.eccc module#

Environment and Climate Change Canada Data Conversion module.

miranda.convert.eccc.convert_canswe(file: str | Path, output: str | Path)[source]#: Convert the CanSWE netCDF files to production-ready CF-compliant netCDFs.

miranda.convert.eccc_rdrs module#

Environment and Climate Change Canada RDRS conversion tools.

miranda.convert.eccc_rdrs.convert_rdrs(project: str, input_folder: str | PathLike, output_folder: str | PathLike, output_format: str = 'zarr', working_folder: str | PathLike | None = None, overwrite: bool = False, **dask_kwargs) → None[source]#

Parameters:

project (str)
input_folder (str or os.PathLike)
output_folder (str or os.PathLike)
output_format ({“netcdf”, “zarr”})
working_folder (str or os.PathLike, optional)
overwrite (bool)
**dask_kwargs

Returns:

None

miranda.convert.eccc_rdrs.rdrs_to_daily(project: str, input_folder: str | PathLike, output_folder: str | PathLike, working_folder: str | PathLike | None = None, overwrite: bool = False, output_format: str = 'zarr', year_start: int | None = None, year_end: int | None = None, process_variables: list[str] | None = None, **dask_kwargs) → None[source]#

Write out RDRS files to daily-timestep files.

Parameters:

project (str)
input_folder (str or os.PathLike)
output_folder (str or os.PathLike)
working_folder (str or os.PathLike)
overwrite (bool)
output_format ({“netcdf”, “zarr”})
year_start (int, optional)
year_end (int, optional)
process_variables (list of str, optional)
**dask_kwargs

Returns:

None

miranda.convert.ecmwf module#

ECMWF TIGGE Conversion module.

miranda.convert.ecmwf.tigge_convert(source: PathLike | None = None, target: PathLike | None = None, processes: int = 8) → None[source]#

Convert grib2 file to netCDF format.

Parameters:

source (os.PathLike, optional)
target (os.PathLike, optional)
processes (int)

Returns:

None

miranda.convert.hq module#

Hydro Quebec Weather Station Data Conversion module.

miranda.convert.hq.open_csv(path: str | Path, cf_table: dict | None = {'hurs': {'cell_methods': 'time: point', 'comment': 'The relative humidity with respect to liquid water for T> 0 C, and with respect to ice for T<0 C.', 'frequency': '1h', 'long_name': 'Near-Surface Relative Humidity', 'out_name': 'hurs', 'standard_name': 'relative_humidity', 'type': 'real', 'units': '%'}, 'prlp': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the liquid phase.', 'frequency': 'day', 'long_name': 'Rainfall Flux', 'out_name': 'prlp', 'standard_name': 'rainfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'prsn': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the solid phase.', 'frequency': 'day', 'long_name': 'Snowfall Flux', 'out_name': 'prsn', 'standard_name': 'snowfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'sfcWind': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) wind speed.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Speed', 'out_name': 'sfcWind', 'standard_name': 'wind_speed', 'type': 'real', 'units': 'm s-1'}, 'sfcWindAz': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) direction from which wind originates.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Direction', 'out_name': 'sfcWindAz', 'standard_name': 'wind_direction', 'type': 'real', 'units': 'degree'}, 'snd': {'cell_methods': 'time: point', 'comment': 'The thickness of snow.', 'frequency': '1h', 'long_name': 'Snow Depth', 'out_name': 'snd', 'standard_name': 'surface_snow_thickness', 'type': 'real', 'units': 'm'}, 'tasmax_1h': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmax_day': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_1h': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_day': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}}) → DataArray[source]#: Extract daily HQ meteo data and convert to xr.DataArray with CF-Convention attributes.

miranda.convert.melcc module#

MELCC (Québec) Weather Stations data conversion module.

miranda.convert.melcc.concat(files: Sequence[str | Path], output_folder: str | Path, overwrite: bool = True) → Path[source]#

Concatenate converted weather station files.

Parameters:

files (sequence of str or Path)
output_folder (str or Path)
overwrite (bool)

Returns:

Path

miranda.convert.melcc.convert_mdb(database: str | Path, stations: Dataset, definitions: Dataset, output: str | Path, overwrite: bool = True) → dict[tuple[str, str], pathlib.Path][source]#

Convert microsoft databases of MELCC observation data to xarray objects.

Parameters:

database (str or Path)
stations
definitions
output
overwrite

Returns:

dict[tuple[str, str], Path]

miranda.convert.melcc.convert_melcc_obs(metafile: str | Path, folder: str | Path, output: Path | str | None = None, overwrite: bool = True) → dict[tuple[str, str], pathlib.Path][source]#

Convert MELCC observation data to xarray data objects, returning paths.

Parameters:

metafile (str or Path)
folder (str or Path)
output (str or Path, optional)
overwrite (bool)

Returns:

dict[str, Path]

miranda.convert.melcc.convert_snow_table(file: str | Path, output: str | Path)[source]#

Convert snow data given through an Excel file.

This private data is not included in the MDB files.

Parameters:

file (path) – The excel file with sheets: “Stations”, “Périodes standards” and “Données”
output (path) – Folder where to put the netCDF files (one for each of snd, sd and snw).

miranda.convert.melcc.list_tables(db_file)[source]#: List the tables of an MDB file.

miranda.convert.melcc.parse_var_code(vcode: str) → dict[str, Any][source]#

Parse variable code to generate metadata

Parameters:: vcode (str)
Returns:: dict[str, Any]

miranda.convert.melcc.read_definitions(dbfile: str)[source]#

Read variable definition file using mdbtools.

Parameters:: dbfile (str)
Returns:: pandas.DataFrame

miranda.convert.melcc.read_stations(db_file: str | PathLike) → DataFrame[source]#

Read station file using mdbtools.

Parameters:: db_file (str or os.PathLike)
Returns:: pandas.DataFrame

miranda.convert.melcc.read_table(db_file: str | PathLike, tab: str | PathLike) → Dataset[source]#

Read a MySQL table into an xarray object.

Parameters:

db_file (str or os.PathLike)
tab (str or os.PathLike)

Returns:

xarray.Dataset

miranda.convert.utils module#

Conversion Utilities submodule.

miranda.convert.utils.date_parser(date: str, *, end_of_period: bool = False, output_type: str = 'str', strftime_format: str = '%Y-%m-%d') → str | Timestamp | NaTType[source]#

Parses datetime objects from a string representation of a date or both a start and end date.

Parameters:

date (str) – Date to be converted.
end_of_period (bool) – If True, the date will be the end of month or year depending on what’s most appropriate.
output_type ({“datetime”, “str”}) – Desired returned object type.
strftime_format (str) – If output_type==’str’, this sets the strftime format.

Returns:

pd.Timestamp or str or pd.NaT – Parsed date.

Notes

Adapted from code written by Gabriel Rondeau-Genesse (@RondeauG)

miranda.convert.utils.find_version_hash(file: PathLike | str) → dict[source]#

Check for an existing version hash file and, if one cannot be found, generate one from file.

Parameters:: file (str or os.PathLike)
Returns:: dict