miranda.convert package#

miranda.convert.aggregate(ds, freq: str = 'day') Dict[str, Dataset][source]#
miranda.convert.aggregations_possible(ds: Dataset, freq: str = 'day') Dict[str, Set[str]][source]#
miranda.convert.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float = 0.5, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray[source]#

Convert an existing Xarray-compatible dataset to another format with variable corrections applied.

Parameters:
  • input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with xarray.open_mfdataset() and concatenate files.

  • project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.

  • domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.

  • mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.

  • mask_cutoff (float) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: 0.5.

  • add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.

  • preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).

  • **xr_kwargs – Arguments passed directly to xarray.

Returns:

xr.Dataset or xr.DataArray

miranda.convert.dataset_corrections(ds: Dataset, project: str) Dataset[source]#

Convert variables to CF-compliant format

miranda.convert.dims_conversion(d: Dataset, p: str, m: dict) Dataset[source]#
miranda.convert.gather_agcfsr(path: str | PathLike) Dict[str, List[Path]][source]#

Gather agCFSR source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_agmerra(path: str | PathLike) Dict[str, List[Path]][source]#

Gather agMERRA source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) Dict[str, List[Path]][source]#
Parameters:
  • project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})

  • path (str or os.PathLike)

  • back_extension (bool)

  • monthly_means (bool)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_grnch(path: str | PathLike) Dict[str, List[Path]][source]#
miranda.convert.gather_nrcan_gridded_obs(path: str | PathLike) Dict[str, List[Path]][source]#

Gather NRCan Gridded Observations source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_raw_rdrs_by_years(path: str | PathLike) Dict[str, Dict[str, List[Path]]][source]#

Gather raw RDRS files for preprocessing.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, dict(str, list[Path])) or None

miranda.convert.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) Dict[str, Dict[str, List[Path]]][source]#

Gather RDRS processed source data.

Parameters:
  • name (str)

  • path (str or os.PathLike)

  • suffix (str)

  • key (str one of ‘raw’ or ‘cf’ indicating which variable name dictionary to search for)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_sc_earth(path: str | PathLike) Dict[str, List[Path]][source]#

Gather SC-Earth source data

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.gather_wfdei_gem_capa(path: str | PathLike) Dict[str, List[Path]][source]#

Gather WFDEI-GEM-CaPa source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert.load_json_data_mappings(project: str) dict[source]#
miranda.convert.metadata_conversion(d: Dataset, p: str, m: Dict) Dataset[source]#

Update xarray dataset and data_vars with project-specific metadata fields.

Parameters:
  • d (xarray.Dataset) – Dataset with metadata to be updated.

  • p (str) – Dataset project name.

  • m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray[source]#

Land-Sea mask operations.

Parameters:
  • ds (Union[xr.Dataset, str, os.PathLike])

  • mask (Union[xr.Dataset, xr.DataArray])

  • mask_cutoff (float or bool)

Returns:

Union[xr.Dataset, xr.DataArray]

miranda.convert.variable_conversion(d: Dataset, p: str, m: dict) Dataset[source]#

Submodules#

miranda.convert._aggregation module#

miranda.convert._aggregation.aggregate(ds, freq: str = 'day') Dict[str, Dataset][source]#
miranda.convert._aggregation.aggregations_possible(ds: Dataset, freq: str = 'day') Dict[str, Set[str]][source]#

miranda.convert._data_corrections module#

miranda.convert._data_corrections.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float = 0.5, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray[source]#

Convert an existing Xarray-compatible dataset to another format with variable corrections applied.

Parameters:
  • input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with xarray.open_mfdataset() and concatenate files.

  • project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.

  • domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.

  • mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.

  • mask_cutoff (float) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: 0.5.

  • add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.

  • preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).

  • **xr_kwargs – Arguments passed directly to xarray.

Returns:

xr.Dataset or xr.DataArray

miranda.convert._data_corrections.dataset_corrections(ds: Dataset, project: str) Dataset[source]#

Convert variables to CF-compliant format

miranda.convert._data_corrections.dims_conversion(d: Dataset, p: str, m: dict) Dataset[source]#
miranda.convert._data_corrections.load_json_data_mappings(project: str) dict[source]#
miranda.convert._data_corrections.metadata_conversion(d: Dataset, p: str, m: Dict) Dataset[source]#

Update xarray dataset and data_vars with project-specific metadata fields.

Parameters:
  • d (xarray.Dataset) – Dataset with metadata to be updated.

  • p (str) – Dataset project name.

  • m (dict) – Metadata definition dictionary for project and variable(s).

Returns:

xarray.Dataset

miranda.convert._data_corrections.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray[source]#

Land-Sea mask operations.

Parameters:
  • ds (Union[xr.Dataset, str, os.PathLike])

  • mask (Union[xr.Dataset, xr.DataArray])

  • mask_cutoff (float or bool)

Returns:

Union[xr.Dataset, xr.DataArray]

miranda.convert._data_corrections.variable_conversion(d: Dataset, p: str, m: dict) Dataset[source]#

miranda.convert._data_definitions module#

miranda.convert._data_definitions.gather_agcfsr(path: str | PathLike) Dict[str, List[Path]][source]#

Gather agCFSR source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_agmerra(path: str | PathLike) Dict[str, List[Path]][source]#

Gather agMERRA source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) Dict[str, List[Path]][source]#
Parameters:
  • project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})

  • path (str or os.PathLike)

  • back_extension (bool)

  • monthly_means (bool)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_grnch(path: str | PathLike) Dict[str, List[Path]][source]#
miranda.convert._data_definitions.gather_nrcan_gridded_obs(path: str | PathLike) Dict[str, List[Path]][source]#

Gather NRCan Gridded Observations source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_raw_rdrs_by_years(path: str | PathLike) Dict[str, Dict[str, List[Path]]][source]#

Gather raw RDRS files for preprocessing.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, dict(str, list[Path])) or None

miranda.convert._data_definitions.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) Dict[str, Dict[str, List[Path]]][source]#

Gather RDRS processed source data.

Parameters:
  • name (str)

  • path (str or os.PathLike)

  • suffix (str)

  • key (str one of ‘raw’ or ‘cf’ indicating which variable name dictionary to search for)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_sc_earth(path: str | PathLike) Dict[str, List[Path]][source]#

Gather SC-Earth source data

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._data_definitions.gather_wfdei_gem_capa(path: str | PathLike) Dict[str, List[Path]][source]#

Gather WFDEI-GEM-CaPa source data.

Parameters:

path (str or os.PathLike)

Returns:

dict(str, list[pathlib.Path])

miranda.convert._reconstruction module#

miranda.convert._reconstruction.reanalysis_processing(data: Dict[str, List[str | PathLike]], output_folder: str | PathLike, variables: Sequence[str], aggregate: str | bool = False, domains: str | List[str] = '_DEFAULT', start: str | None = None, end: str | None = None, target_chunks: dict | None = None, output_format: str = 'netcdf', overwrite: bool = False, engine: str = 'h5netcdf', n_workers: int = 4, **dask_kwargs) None[source]#
Parameters:
  • data (Dict[str, List[str]])

  • output_folder (Union[str, os.PathLike])

  • variables (Sequence[str])

  • aggregate ({“day”, None})

  • domains ({“QC”, “CAN”, “AMNO”, “NAM”, “GLOBAL”})

  • start (str, optional)

  • end (str, optional)

  • target_chunks (dict, optional)

  • output_format ({“netcdf”, “zarr”})

  • overwrite (bool)

  • engine ({“netcdf4”, “h5netcdf”})

  • n_workers (int)

Returns:

None

miranda.convert.deh module#

miranda.convert.deh.open_txt(path: str | Path, cf_table: dict | None = {'flag': {'comment': 'See DEH technical information for details.', 'long_name': 'data flag'}, 'q': {'long_name': 'River discharge', 'units': 'm3 s-1'}}) Dataset[source]#

Extract daily HQ meteorological data and convert to xr.DataArray with CF-Convention attributes.

miranda.convert.eccc module#

miranda.convert.eccc.convert_canswe(file: str | Path, output: str | Path)[source]#

Convert the CanSWE netCDF files to production-ready CF-compliant netCDFs.

miranda.convert.eccc_rdrs module#

miranda.convert.eccc_rdrs.convert_rdrs(project: str, input_folder: str | PathLike, output_folder: str | PathLike, output_format: str = 'zarr', working_folder: str | PathLike | None = None, overwrite: bool = False, **dask_kwargs)[source]#
Parameters:
  • project

  • input_folder

  • output_folder

  • output_format

  • working_folder

  • overwrite

  • dask_kwargs

miranda.convert.eccc_rdrs.rdrs_to_daily(project: str, input_folder: str | PathLike, output_folder: str | PathLike, working_folder: str | PathLike | None = None, overwrite: bool = False, output_format: str = 'zarr', year_start: int | None = None, year_end: int | None = None, process_variables: list | None = None, **dask_kwargs)[source]#
Parameters:
  • project

  • input_folder

  • output_folder

  • working_folder

  • overwrite

  • output_format

  • year_start

  • year_end

  • process_variables

  • dask_kwargs

miranda.convert.ecmwf module#

miranda.convert.ecmwf.tigge_convert(source: PathLike | None = None, target: PathLike | None = None, processes: int = 8) None[source]#

Convert grib2 file to netCDF format.

Parameters:
  • source (os.PathLike, optional)

  • target (os.PathLike, optional)

  • processes (int)

Returns:

None

miranda.convert.hq module#

miranda.convert.hq.open_csv(path: str | Path, cf_table: dict | None = {'hurs': {'cell_methods': 'time: point', 'comment': 'The relative humidity with respect to liquid water for T> 0 C, and with respect to ice for T<0 C.', 'frequency': '1h', 'long_name': 'Near-Surface Relative Humidity', 'out_name': 'hurs', 'standard_name': 'relative_humidity', 'type': 'real', 'units': '%'}, 'prlp': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the liquid phase.', 'frequency': 'day', 'long_name': 'Rainfall Flux', 'out_name': 'prlp', 'standard_name': 'rainfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'prsn': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the solid phase.', 'frequency': 'day', 'long_name': 'Snowfall Flux', 'out_name': 'prsn', 'standard_name': 'snowfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'sfcWind': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) wind speed.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Speed', 'out_name': 'sfcWind', 'standard_name': 'wind_speed', 'type': 'real', 'units': 'm s-1'}, 'sfcWindAz': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) direction from which wind originates.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Direction', 'out_name': 'sfcWindAz', 'standard_name': 'wind_direction', 'type': 'real', 'units': 'degree'}, 'snd': {'cell_methods': 'time: point', 'comment': 'The thickness of snow.', 'frequency': '1h', 'long_name': 'Snow Depth', 'out_name': 'snd', 'standard_name': 'surface_snow_thickness', 'type': 'real', 'units': 'm'}, 'tasmax_1h': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmax_day': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_1h': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_day': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}}) DataArray[source]#

Extract daily HQ meteo data and convert to xr.DataArray with CF-Convention attributes.

miranda.convert.melcc module#

miranda.convert.melcc.concat(files: Sequence[str | Path], output_folder: str | Path, overwrite: bool = True)[source]#
miranda.convert.melcc.convert_mdb(database: str | Path, stations: Dataset, definitions: Dataset, output: str | Path, overwrite: bool = True)[source]#
miranda.convert.melcc.convert_melcc_obs(metafile: str | Path, folder: str | Path, output: str | Path | None = None, overwrite: bool = True)[source]#
miranda.convert.melcc.convert_snow_table(file: str | Path, output: str | Path)[source]#

Convert snow data given through an Excel file.

This private data is not included in the MDB files.

Parameters:
  • file (path) – The excel file with sheets: “Stations”, “Périodes standards” and “Données”

  • output (path) – Folder where to put the netCDF files (one for each of snd, sd and snw).

miranda.convert.melcc.list_tables(dbfile)[source]#

List the tables of a MDB file.

miranda.convert.melcc.parse_var_code(vcode)[source]#
miranda.convert.melcc.read_definitions(dbfile)[source]#
miranda.convert.melcc.read_stations(dbfile)[source]#
miranda.convert.melcc.read_table(dbfile, table)[source]#

miranda.convert.utils module#

miranda.convert.utils.date_parser(date: str, *, end_of_period: bool = False, output_type: str = 'str', strftime_format: str = '%Y-%m-%d') str | Timestamp | NaTType[source]#

Parses datetime objects from a string representation of a date or both a start and end date.

Parameters:
  • date (str) – Date to be converted.

  • end_of_period (bool) – If True, the date will be the end of month or year depending on what’s most appropriate.

  • output_type ({“datetime”, “str”}) – Desired returned object type.

  • strftime_format (str) – If output_type==’str’, this sets the strftime format.

Returns:

pd.Timestamp or str or pd.NaT – Parsed date.

Notes

Adapted from code written by Gabriel Rondeau-Genesse (@RondeauG)

miranda.convert.utils.find_version_hash(file: PathLike | str) Dict[source]#
Parameters:

file (Path or str)

Returns:

dict