miranda.preprocess package¶

Preprocessing tools for Miranda.

Submodules¶

miranda.preprocess._eccc_ahccd module¶

Adjusted and Homogenized Canadian Clime Data module.

miranda.preprocess._eccc_ahccd.convert_ahccd(data_source: str | Path, output_dir: str | Path, variable: str, *, generation: int, merge: bool = False, overwrite: bool = False) → None[source]¶

Convert Adjusted and Homogenized Canadian Climate Dataset files.

Parameters:

data_source (str or Path)
output_dir (str or Path)
variable (str)
generation (int)
merge (bool)
overwrite (bool)

Returns:

None

miranda.preprocess._eccc_ahccd.convert_ahccd_fwf_file(ff: Path | str, metadata: DataFrame, variable: str, *, generation: int) → Dataset[source]¶

Convert AHCCD fixed-width files.

Parameters:

ff (str or Path)
metadata (pandas.DataFrame)
variable (str)
generation (int)

Returns:

xarray.Dataset

miranda.preprocess._eccc_ahccd.merge_ahccd(data_source: str | Path, output_dir: str | Path | None = None, variable: str | None = None, overwrite: bool = False) → None[source]¶: Merge Adjusted and Homogenized Canadian Climate Dataset files.

miranda.preprocess._eccc_obs module¶

Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.

miranda.preprocess._eccc_obs.convert_station(data: str | PathLike, variable: str, mode: str, using_dask_array: bool = False, *, client: Any, **kwargs)[source]¶: Convert a single station’s data from the fixed-width format to a netCDF file.

Merge converted variables into a single file per variable.

Parameters:

source_files (str, Path)
output_folder (str, Path)
variables (str or int or list of str or int, optional)
overwrite (bool)
n_workers (int)

Returns:

None

Merge stations.

Parameters:

source_files (str or Path) – Source files to be aggregated.
output_folder (str or Path) – Output folder for the aggregated files.
variables (str or int or list of str or int, optional) – The variable codes to be aggregated.
time_step ({“hourly”, “daily”}) – The time step to be used for aggregation.
include_flags (bool) – Include flags in the output files.
groupings (int) – The number of files in each group used for converting to multi-file Datasets.
mf_dataset_freq (str, optional) – Resampling frequency for creating output multi-file Datasets. E.g. ‘YS’: 1 year per file, ‘5YS’: 5 years per file.
temp_directory (str or Path, optional) – Use another temporary directory location in case default location is not spacious enough.
n_workers (int) – The number of workers to use.

Returns:

None

miranda.preprocess._eccc_summaries module¶

miranda.preprocess._eccc_summaries.daily_summaries_to_netcdf(station: dict, path_output: Path | str) → None[source]¶

Convert daily climate summaries to NetCDF files.

Uses xarray to transform the ‘station’ from find_and_extract_dly into a CF-Convention netCDF file

Parameters:

station (dict) – dict created by using find_and_extract_dly
path_output (str or Path) – Output path.

Returns:

None

miranda.preprocess._eccc_summaries.extract_daily_summaries(path_station: Path | str, rm_flags: bool = False, file_suffix: str = '.csv') → dict[source]¶

Extract daily climate summaries from ECCC CSV files.

Parameters:

path_station (str or Path) – PathLike or str to the station’s folder containing the csv files.
rm_flags (bool) – Removes the ‘Flag’ and ‘Quality’ columns of the ECCC files.
file_suffix (str) – File suffixes used by the tabular data. Default: “.csv”.

Returns:

dict – dict containing the station metadata, as well as the data stored within a pandas Dataframe.

miranda.preprocess._metadata module¶

miranda.preprocess._metadata.eccc_variable_metadata(variable_code: str | int, project: str, generation: int | None = None, metadata: dict | None = None) → dict[str, Any][source]¶

Return the metadata for a given variable code and project.

Parameters:

variable_code (str or int)
project ({“eccc-ahccd”, “eccc-obs”, “eccc-obs-summary”})
generation ({1, 2, 3}, optional)
metadata (dict, optional)

Returns:

dict

miranda.preprocess._metadata.homogenized_column_definitions(variable_code: str) → tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int][source]¶

Return the column names, widths, and data types for the AHCCD fixed-width format data.

Parameters:: variable_code (str)
Returns:: tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int]

miranda.preprocess._metadata.obs_column_definitions(time_frequency: str) → tuple[list[str], list[int], list[type[str | int]], int][source]¶: Return the column names, widths, and data types for the fixed-width format.

miranda.preprocess.eccc module¶

Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.

miranda.preprocess.ecmwf_tigge module¶

ECMWF TIGGE Conversion module.

miranda.preprocess.ecmwf_tigge.tigge_convert(source: PathLike | None = None, target: PathLike | None = None, processes: int = 8) → None[source]¶

Convert TIGGE grib2 file to netCDF format.

Parameters:

source (os.PathLike, optional) – The source directory containing the TIGGE files.
target (os.PathLike, optional) – The target directory to save the converted files.
processes (int) – The number of processes to use for the conversion.