miranda.ghcn package

Global Historical Climate Network specialized conversion module.

Submodules

miranda.ghcn._ghcn module

miranda.ghcn._ghcn.create_ghcn_xarray(in_files: list, variable_meta: dict, station_meta: DataFrame, project: str, start_date: str | Timestamp, end_date: str | Timestamp, varlist: list | None = None, n_workers: int | None = None) Dataset | None[source]

Create a Zarr dump of DWD climate summary data.

Parameters:
  • in_files (list) – A list of input files.

  • variable_meta (dict) – Variable metadata.

  • station_meta (pd.DataFrame) – Station metadata.

  • project (str) – Project name.

  • start_date (str or pd.Timestamp) – Start date of the data to be processed.

  • end_date (str or pd.Timestamp) – End date of the data to be processed.

  • varlist (list) – List of variables to keep, if None, all variables are kept.

  • n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used

Returns:

xr.Dataset, optional – Dataset.

miranda.ghcn._ghcn.download_ghcn(project: str, working_folder: str | PathLike[str] | None = None, lon_bnds: list[float] | None = None, lat_bnds: list[float] | None = None, update_raw: bool = False, timeout: int | None = None, retry: int = 5, n_workers: int | None = None) None[source]

Download GHCN data.

Parameters:
  • project (str) – Project name.

  • working_folder (str or os.PathLink[str], optional) – Temporary files folder.

  • lon_bnds (list of float, optional) – Longitude boundaries.

  • lat_bnds (list of float, optional) – Latitude boundaries.

  • update_raw (bool) – Whether to update the raw files or not.

  • timeout (int, optional) – Request timeout in seconds.

  • retry (int) – Number of retries.

  • n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used

Raises:

ValueError – If the project name is unknown.

miranda.ghcn._ghcn.get_ghcn_raw(station_ids: list, station_type: str, out_folder: Path, timeout: int = 10, update_raw: bool = False, n_workers: int | None = None) list[str][source]

Download raw GHCN data.

Parameters:
  • station_ids (list[str]) – List of station IDs.

  • station_type (str) – Station type.

  • out_folder (Path) – Output folder.

  • timeout (int) – Request timeout in seconds. Default is 10.

  • update_raw (bool) – Whether to update raw data.

  • n_workers (int, optional) – Number of parallel workers to use. If None or 1, no parallelism is used

Returns:

list of str – List of station IDs that failed to download.