miranda.preprocess package

Preprocessing tools for Miranda.

Submodules

miranda.preprocess._eccc_ahccd module

Adjusted and Homogenized Canadian Clime Data module.

miranda.preprocess._eccc_ahccd.convert_ahccd(data_source: str | Path, output_dir: str | Path, variable: str, *, generation: int, merge: bool = False, overwrite: bool = False) None[source]

Convert Adjusted and Homogenized Canadian Climate Dataset files.

Parameters:
  • data_source (str or Path)

  • output_dir (str or Path)

  • variable (str)

  • generation (int)

  • merge (bool)

  • overwrite (bool)

Returns:

None

miranda.preprocess._eccc_ahccd.convert_ahccd_fwf_file(ff: Path | str, metadata: DataFrame, variable: str, *, generation: int) Dataset[source]

Convert AHCCD fixed-width files.

Parameters:
  • ff (str or Path)

  • metadata (pandas.DataFrame)

  • variable (str)

  • generation (int)

Returns:

xarray.Dataset

miranda.preprocess._eccc_ahccd.merge_ahccd(data_source: str | Path, output_dir: str | Path | None = None, variable: str | None = None, overwrite: bool = False) None[source]

Merge Adjusted and Homogenized Canadian Climate Dataset files.

miranda.preprocess._eccc_obs module

Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.

miranda.preprocess._eccc_obs.convert_station(data: str | PathLike, variable: str, mode: str, using_dask_array: bool = False, *, client: Any, **kwargs)[source]

Convert a single station’s data from the fixed-width format to a netCDF file.

miranda.preprocess._eccc_obs.merge_converted_variables(source_files: str | PathLike, output_folder: str | PathLike, variables: str | int | list[str | int] | None = None, overwrite: bool = False, n_workers: int = 1) None[source]

Merge converted variables into a single file per variable.

Parameters:
  • source_files (str, Path)

  • output_folder (str, Path)

  • variables (str or int or list of str or int, optional)

  • overwrite (bool)

  • n_workers (int)

Returns:

None

miranda.preprocess._eccc_obs.merge_stations(source_files: str | PathLike | None = None, output_folder: str | PathLike | None = None, *, time_step: str, variables: str | int | list[str | int] | None = None, include_flags: bool = True, groupings: int | None = None, mf_dataset_freq: str | None = None, temp_directory: str | PathLike | None = None, n_workers: int = 1) None[source]

Merge stations.

Parameters:
  • source_files (str or Path) – Source files to be aggregated.

  • output_folder (str or Path) – Output folder for the aggregated files.

  • variables (str or int or list of str or int, optional) – The variable codes to be aggregated.

  • time_step ({“hourly”, “daily”}) – The time step to be used for aggregation.

  • include_flags (bool) – Include flags in the output files.

  • groupings (int) – The number of files in each group used for converting to multi-file Datasets.

  • mf_dataset_freq (str, optional) – Resampling frequency for creating output multi-file Datasets. E.g. ‘YS’: 1 year per file, ‘5YS’: 5 years per file.

  • temp_directory (str or Path, optional) – Use another temporary directory location in case default location is not spacious enough.

  • n_workers (int) – The number of workers to use.

Returns:

None

miranda.preprocess._eccc_summaries module

miranda.preprocess._eccc_summaries.daily_summaries_to_netcdf(station: dict, path_output: Path | str) None[source]

Convert daily climate summaries to NetCDF files.

Uses xarray to transform the ‘station’ from find_and_extract_dly into a CF-Convention netCDF file

Parameters:
  • station (dict) – dict created by using find_and_extract_dly

  • path_output (str or Path) – Output path.

Returns:

None

miranda.preprocess._eccc_summaries.extract_daily_summaries(path_station: Path | str, rm_flags: bool = False, file_suffix: str = '.csv') dict[source]

Extract daily climate summaries from ECCC CSV files.

Parameters:
  • path_station (str or Path) – PathLike or str to the station’s folder containing the csv files.

  • rm_flags (bool) – Removes the ‘Flag’ and ‘Quality’ columns of the ECCC files.

  • file_suffix (str) – File suffixes used by the tabular data. Default: “.csv”.

Returns:

dict – dict containing the station metadata, as well as the data stored within a pandas Dataframe.

miranda.preprocess._metadata module

miranda.preprocess._metadata.eccc_variable_metadata(variable_code: str | int, project: str, generation: int | None = None, metadata: dict | None = None) dict[str, Any][source]

Return the metadata for a given variable code and project.

Parameters:
  • variable_code (str or int)

  • project ({“eccc-ahccd”, “eccc-obs”, “eccc-obs-summary”})

  • generation ({1, 2, 3}, optional)

  • metadata (dict, optional)

Returns:

dict

miranda.preprocess._metadata.homogenized_column_definitions(variable_code: str) tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int][source]

Return the column names, widths, and data types for the AHCCD fixed-width format data.

Parameters:

variable_code (str)

Returns:

tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int]

miranda.preprocess._metadata.obs_column_definitions(time_frequency: str) tuple[list[str], list[int], list[type[str | int]], int][source]

Return the column names, widths, and data types for the fixed-width format.

miranda.preprocess.eccc module

Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.

miranda.preprocess.ecmwf_tigge module

ECMWF TIGGE Conversion module.

miranda.preprocess.ecmwf_tigge.tigge_convert(source: PathLike | None = None, target: PathLike | None = None, processes: int = 8) None[source]

Convert TIGGE grib2 file to netCDF format.

Parameters:
  • source (os.PathLike, optional) – The source directory containing the TIGGE files.

  • target (os.PathLike, optional) – The target directory to save the converted files.

  • processes (int) – The number of processes to use for the conversion.