miranda.convert package#
- miranda.convert.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float = 0.5, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray [source]#
Convert an existing Xarray-compatible dataset to another format with variable corrections applied.
- Parameters:
input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with
xarray.open_mfdataset()
and concatenate files.project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: 0.5.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert.dataset_corrections(ds: Dataset, project: str) Dataset [source]#
Convert variables to CF-compliant format
- miranda.convert.gather_agcfsr(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather agCFSR source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_agmerra(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather agMERRA source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) Dict[str, List[Path]] [source]#
- Parameters:
project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_nrcan_gridded_obs(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather NRCan Gridded Observations source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_raw_rdrs_by_years(path: str | PathLike) Dict[str, Dict[str, List[Path]]] [source]#
Gather raw RDRS files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, dict(str, list[Path])) or None
- miranda.convert.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) Dict[str, Dict[str, List[Path]]] [source]#
Gather RDRS processed source data.
- Parameters:
name (str)
path (str or os.PathLike)
suffix (str)
key (str one of ‘raw’ or ‘cf’ indicating which variable name dictionary to search for)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_sc_earth(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather SC-Earth source data
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_wfdei_gem_capa(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather WFDEI-GEM-CaPa source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.metadata_conversion(d: Dataset, p: str, m: Dict) Dataset [source]#
Update xarray dataset and data_vars with project-specific metadata fields.
- Parameters:
d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray [source]#
Land-Sea mask operations.
- Parameters:
ds (Union[xr.Dataset, str, os.PathLike])
mask (Union[xr.Dataset, xr.DataArray])
mask_cutoff (float or bool)
- Returns:
Union[xr.Dataset, xr.DataArray]
Submodules#
miranda.convert._aggregation module#
miranda.convert._data_corrections module#
- miranda.convert._data_corrections.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float = 0.5, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray [source]#
Convert an existing Xarray-compatible dataset to another format with variable corrections applied.
- Parameters:
input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with
xarray.open_mfdataset()
and concatenate files.project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: 0.5.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert._data_corrections.dataset_corrections(ds: Dataset, project: str) Dataset [source]#
Convert variables to CF-compliant format
- miranda.convert._data_corrections.metadata_conversion(d: Dataset, p: str, m: Dict) Dataset [source]#
Update xarray dataset and data_vars with project-specific metadata fields.
- Parameters:
d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert._data_corrections.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray [source]#
Land-Sea mask operations.
- Parameters:
ds (Union[xr.Dataset, str, os.PathLike])
mask (Union[xr.Dataset, xr.DataArray])
mask_cutoff (float or bool)
- Returns:
Union[xr.Dataset, xr.DataArray]
miranda.convert._data_definitions module#
- miranda.convert._data_definitions.gather_agcfsr(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather agCFSR source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert._data_definitions.gather_agmerra(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather agMERRA source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert._data_definitions.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) Dict[str, List[Path]] [source]#
- Parameters:
project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert._data_definitions.gather_grnch(path: str | PathLike) Dict[str, List[Path]] [source]#
- miranda.convert._data_definitions.gather_nrcan_gridded_obs(path: str | PathLike) Dict[str, List[Path]] [source]#
Gather NRCan Gridded Observations source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert._data_definitions.gather_raw_rdrs_by_years(path: str | PathLike) Dict[str, Dict[str, List[Path]]] [source]#
Gather raw RDRS files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, dict(str, list[Path])) or None
- miranda.convert._data_definitions.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) Dict[str, Dict[str, List[Path]]] [source]#
Gather RDRS processed source data.
- Parameters:
name (str)
path (str or os.PathLike)
suffix (str)
key (str one of ‘raw’ or ‘cf’ indicating which variable name dictionary to search for)
- Returns:
dict(str, list[pathlib.Path])
miranda.convert._reconstruction module#
- miranda.convert._reconstruction.reanalysis_processing(data: Dict[str, List[str | PathLike]], output_folder: str | PathLike, variables: Sequence[str], aggregate: str | bool = False, domains: str | List[str] = '_DEFAULT', start: str | None = None, end: str | None = None, target_chunks: dict | None = None, output_format: str = 'netcdf', overwrite: bool = False, engine: str = 'h5netcdf', n_workers: int = 4, **dask_kwargs) None [source]#
- Parameters:
data (Dict[str, List[str]])
output_folder (Union[str, os.PathLike])
variables (Sequence[str])
aggregate ({“day”, None})
domains ({“QC”, “CAN”, “AMNO”, “NAM”, “GLOBAL”})
start (str, optional)
end (str, optional)
target_chunks (dict, optional)
output_format ({“netcdf”, “zarr”})
overwrite (bool)
engine ({“netcdf4”, “h5netcdf”})
n_workers (int)
- Returns:
None
miranda.convert.deh module#
- miranda.convert.deh.open_txt(path: str | Path, cf_table: dict | None = {'flag': {'comment': 'See DEH technical information for details.', 'long_name': 'data flag'}, 'q': {'long_name': 'River discharge', 'units': 'm3 s-1'}}) Dataset [source]#
Extract daily HQ meteorological data and convert to xr.DataArray with CF-Convention attributes.
miranda.convert.eccc module#
miranda.convert.eccc_rdrs module#
- miranda.convert.eccc_rdrs.convert_rdrs(project: str, input_folder: str | PathLike, output_folder: str | PathLike, output_format: str = 'zarr', working_folder: str | PathLike | None = None, overwrite: bool = False, **dask_kwargs)[source]#
- Parameters:
project
input_folder
output_folder
output_format
working_folder
overwrite
dask_kwargs
- miranda.convert.eccc_rdrs.rdrs_to_daily(project: str, input_folder: str | PathLike, output_folder: str | PathLike, working_folder: str | PathLike | None = None, overwrite: bool = False, output_format: str = 'zarr', year_start: int | None = None, year_end: int | None = None, process_variables: list | None = None, **dask_kwargs)[source]#
- Parameters:
project
input_folder
output_folder
working_folder
overwrite
output_format
year_start
year_end
process_variables
dask_kwargs
miranda.convert.ecmwf module#
miranda.convert.hq module#
- miranda.convert.hq.open_csv(path: str | Path, cf_table: dict | None = {'hurs': {'cell_methods': 'time: point', 'comment': 'The relative humidity with respect to liquid water for T> 0 C, and with respect to ice for T<0 C.', 'frequency': '1h', 'long_name': 'Near-Surface Relative Humidity', 'out_name': 'hurs', 'standard_name': 'relative_humidity', 'type': 'real', 'units': '%'}, 'prlp': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the liquid phase.', 'frequency': 'day', 'long_name': 'Rainfall Flux', 'out_name': 'prlp', 'standard_name': 'rainfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'prsn': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the solid phase.', 'frequency': 'day', 'long_name': 'Snowfall Flux', 'out_name': 'prsn', 'standard_name': 'snowfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'sfcWind': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) wind speed.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Speed', 'out_name': 'sfcWind', 'standard_name': 'wind_speed', 'type': 'real', 'units': 'm s-1'}, 'sfcWindAz': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) direction from which wind originates.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Direction', 'out_name': 'sfcWindAz', 'standard_name': 'wind_direction', 'type': 'real', 'units': 'degree'}, 'snd': {'cell_methods': 'time: point', 'comment': 'The thickness of snow.', 'frequency': '1h', 'long_name': 'Snow Depth', 'out_name': 'snd', 'standard_name': 'surface_snow_thickness', 'type': 'real', 'units': 'm'}, 'tasmax_1h': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmax_day': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_1h': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_day': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}}) DataArray [source]#
Extract daily HQ meteo data and convert to xr.DataArray with CF-Convention attributes.
miranda.convert.melcc module#
- miranda.convert.melcc.concat(files: Sequence[str | Path], output_folder: str | Path, overwrite: bool = True)[source]#
- miranda.convert.melcc.convert_mdb(database: str | Path, stations: Dataset, definitions: Dataset, output: str | Path, overwrite: bool = True)[source]#
- miranda.convert.melcc.convert_melcc_obs(metafile: str | Path, folder: str | Path, output: str | Path | None = None, overwrite: bool = True)[source]#
- miranda.convert.melcc.convert_snow_table(file: str | Path, output: str | Path)[source]#
Convert snow data given through an Excel file.
This private data is not included in the MDB files.
- Parameters:
file (path) – The excel file with sheets: “Stations”, “Périodes standards” and “Données”
output (path) – Folder where to put the netCDF files (one for each of snd, sd and snw).
miranda.convert.utils module#
- miranda.convert.utils.date_parser(date: str, *, end_of_period: bool = False, output_type: str = 'str', strftime_format: str = '%Y-%m-%d') str | Timestamp | NaTType [source]#
Parses datetime objects from a string representation of a date or both a start and end date.
- Parameters:
date (str) – Date to be converted.
end_of_period (bool) – If True, the date will be the end of month or year depending on what’s most appropriate.
output_type ({“datetime”, “str”}) – Desired returned object type.
strftime_format (str) – If output_type==’str’, this sets the strftime format.
- Returns:
pd.Timestamp or str or pd.NaT – Parsed date.
Notes
Adapted from code written by Gabriel Rondeau-Genesse (@RondeauG)