miranda.ghcn package¶
Global Historical Climate Network specialized conversion module.
Submodules¶
miranda.ghcn._ghcn module¶
- miranda.ghcn._ghcn.create_ghcn_xarray(in_files: list, variable_meta: dict, station_meta: DataFrame, project: str, start_date: str | Timestamp, end_date: str | Timestamp, varlist: list | None = None, n_workers: int | None = None) Dataset | None[source]¶
Create a Zarr dump of DWD climate summary data.
- Parameters:
in_files (list) – A list of input files.
variable_meta (dict) – Variable metadata.
station_meta (pd.DataFrame) – Station metadata.
project (str) – Project name.
start_date (str or pd.Timestamp) – Start date of the data to be processed.
end_date (str or pd.Timestamp) – End date of the data to be processed.
varlist (list) – List of variables to keep, if None, all variables are kept.
n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used
- Returns:
xr.Dataset, optional – Dataset.
- miranda.ghcn._ghcn.download_ghcn(project: str, working_folder: str | PathLike[str] | None = None, lon_bnds: list[float] | None = None, lat_bnds: list[float] | None = None, update_raw: bool = False, timeout: int | None = None, retry: int = 5, n_workers: int | None = None) None[source]¶
Download GHCN data.
- Parameters:
project (str) – Project name.
working_folder (str or os.PathLink[str], optional) – Temporary files folder.
lon_bnds (list of float, optional) – Longitude boundaries.
lat_bnds (list of float, optional) – Latitude boundaries.
update_raw (bool) – Whether to update the raw files or not.
timeout (int, optional) – Request timeout in seconds.
retry (int) – Number of retries.
n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used
- Raises:
ValueError – If the project name is unknown.
- miranda.ghcn._ghcn.get_ghcn_raw(station_ids: list, station_type: str, out_folder: Path, timeout: int = 10, update_raw: bool = False, n_workers: int | None = None) list[str][source]¶
Download raw GHCN data.
- Parameters:
station_ids (list[str]) – List of station IDs.
station_type (str) – Station type.
out_folder (Path) – Output folder.
timeout (int) – Request timeout in seconds. Default is 10.
update_raw (bool) – Whether to update raw data.
n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used
- Returns:
list of str – List of station IDs that failed to download.