miranda.preprocess package¶
Preprocessing tools for Miranda.
Submodules¶
miranda.preprocess._eccc_ahccd module¶
Adjusted and Homogenized Canadian Clime Data module.
- miranda.preprocess._eccc_ahccd.convert_ahccd(data_source: str | Path, output_dir: str | Path, variable: str, *, generation: int, merge: bool = False, overwrite: bool = False) None[source]¶
Convert Adjusted and Homogenized Canadian Climate Dataset files.
- Parameters:
data_source (str or Path)
output_dir (str or Path)
variable (str)
generation (int)
merge (bool)
overwrite (bool)
- Returns:
None
miranda.preprocess._eccc_obs module¶
Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.
- miranda.preprocess._eccc_obs.convert_station(data: str | PathLike, variable: str, mode: str, using_dask_array: bool = False, *, client: Any, **kwargs)[source]¶
Convert a single station’s data from the fixed-width format to a netCDF file.
- miranda.preprocess._eccc_obs.merge_converted_variables(source_files: str | PathLike, output_folder: str | PathLike, variables: str | int | list[str | int] | None = None, overwrite: bool = False, n_workers: int = 1) None[source]¶
Merge converted variables into a single file per variable.
- Parameters:
source_files (str, Path)
output_folder (str, Path)
variables (str or int or list of str or int, optional)
overwrite (bool)
n_workers (int)
- Returns:
None
- miranda.preprocess._eccc_obs.merge_stations(source_files: str | PathLike | None = None, output_folder: str | PathLike | None = None, *, time_step: str, variables: str | int | list[str | int] | None = None, include_flags: bool = True, groupings: int | None = None, mf_dataset_freq: str | None = None, temp_directory: str | PathLike | None = None, n_workers: int = 1) None[source]¶
Merge stations.
- Parameters:
source_files (str or Path) – Source files to be aggregated.
output_folder (str or Path) – Output folder for the aggregated files.
variables (str or int or list of str or int, optional) – The variable codes to be aggregated.
time_step ({“hourly”, “daily”}) – The time step to be used for aggregation.
include_flags (bool) – Include flags in the output files.
groupings (int) – The number of files in each group used for converting to multi-file Datasets.
mf_dataset_freq (str, optional) – Resampling frequency for creating output multi-file Datasets. E.g. ‘YS’: 1 year per file, ‘5YS’: 5 years per file.
temp_directory (str or Path, optional) – Use another temporary directory location in case default location is not spacious enough.
n_workers (int) – The number of workers to use.
- Returns:
None
miranda.preprocess._eccc_summaries module¶
- miranda.preprocess._eccc_summaries.daily_summaries_to_netcdf(station: dict, path_output: Path | str) None[source]¶
Convert daily climate summaries to NetCDF files.
Uses xarray to transform the ‘station’ from find_and_extract_dly into a CF-Convention netCDF file
- Parameters:
station (dict) – dict created by using find_and_extract_dly
path_output (str or Path) – Output path.
- Returns:
None
- miranda.preprocess._eccc_summaries.extract_daily_summaries(path_station: Path | str, rm_flags: bool = False, file_suffix: str = '.csv') dict[source]¶
Extract daily climate summaries from ECCC CSV files.
- Parameters:
path_station (str or Path) – PathLike or str to the station’s folder containing the csv files.
rm_flags (bool) – Removes the ‘Flag’ and ‘Quality’ columns of the ECCC files.
file_suffix (str) – File suffixes used by the tabular data. Default: “.csv”.
- Returns:
dict – dict containing the station metadata, as well as the data stored within a pandas Dataframe.
miranda.preprocess._metadata module¶
- miranda.preprocess._metadata.eccc_variable_metadata(variable_code: str | int, project: str, generation: int | None = None, metadata: dict | None = None) dict[str, Any][source]¶
Return the metadata for a given variable code and project.
- Parameters:
variable_code (str or int)
project ({“eccc-ahccd”, “eccc-obs”, “eccc-obs-summary”})
generation ({1, 2, 3}, optional)
metadata (dict, optional)
- Returns:
dict
- miranda.preprocess._metadata.homogenized_column_definitions(variable_code: str) tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int][source]¶
Return the column names, widths, and data types for the AHCCD fixed-width format data.
- Parameters:
variable_code (str)
- Returns:
tuple[dict, list[tuple[int, int]], dict[str, type[str | int | float] | Any], int]
miranda.preprocess.eccc module¶
Specialized conversion tools for Environment and Climate Change Canada / Meteorological Service of Canada data.
miranda.preprocess.ecmwf_tigge module¶
ECMWF TIGGE Conversion module.
- miranda.preprocess.ecmwf_tigge.tigge_convert(source: PathLike | None = None, target: PathLike | None = None, processes: int = 8) None[source]¶
Convert TIGGE grib2 file to netCDF format.
- Parameters:
source (os.PathLike, optional) – The source directory containing the TIGGE files.
target (os.PathLike, optional) – The target directory to save the converted files.
processes (int) – The number of processes to use for the conversion.