miranda.structure package#

Data Structure module.

miranda.structure.build_path_from_schema(facets: dict, output_folder: str | PathLike, schema: str | PathLike | dict | None = None, top_folder: str = 'datasets', validate: bool = True) Path | None[source]#

Build a filepath based on a valid data schema.

Parameters:
  • facets (dict) – Facets for a given dataset.

  • output_folder (str or os.PathLike) – Parent folder on which to extend the filetree structure.

  • schema (str or os.PathLike, optional) – Path to YAML schematic of database structure. If None, will use Ouranos schema.

  • top_folder (str) – Top-level of supplied schema, used for validation purposes. Default: “datasets”.

  • validate (bool) – Run facets-validation checks over given file. Default: True.

Returns:

Path or None

miranda.structure.create_version_hash_files(input_files: str | PathLike | list[str | PathLike] | generator | None = None, facet_dict: dict | None = None, verify_hash: bool = False) None[source]#

Create version hashes based on files or a facets dictionary.

Parameters:
  • input_files (str, os.PathLike, list of str or os.PathLike, or GeneratorType)

  • facet_dict (dict, optional)

  • verify_hash (bool)

Returns:

None

miranda.structure.structure_datasets(input_files: str | PathLike | list[str | PathLike] | generator, output_folder: str | PathLike, *, project: str | None = None, guess: bool = True, dry_run: bool = False, method: str = 'copy', make_dirs: bool = False, set_version_hashes: bool = False, verify_hashes: bool = False, suffix: str = 'nc') dict[Path, Path][source]#
Parameters:
  • input_files (str, Path, list of str or Path, or GeneratorType) – Files to be sorted.

  • output_folder (str or Path) – The desired location for the folder-tree.

  • project ({“cordex”, “cmip5”, “cmip6”, “isimip-ft”, “pcic-candcs-u6”, “converted”}, optional) – Project used to parse the facets of all supplied datasets. If not supplied, will attempt parsing with all available data categories for each file (slow) unless guess is True.

  • guess (bool) – If project not supplied, suggest to decoder that activity is the same for all input_files. Default: True.

  • dry_run (bool) – Prints changes that would have been made without performing them. Default: False.

  • method ({“move”, “copy”}) – Method to transfer files to intended location. Default: “move”.

  • make_dirs (bool) – Make folder tree if it does not already exist. Default: False.

  • set_version_hashes (bool) – Make an accompanying file with version in filename and sha256sum in contents. Default: False.

  • verify_hashes (bool) – Ensure that any existing she256sum files correspond with companion file. Raise on error. Default: False.

  • suffix ({“nc”, “zarr”}) – If “zarr”, will perform a ‘glob’ with provided pattern. Otherwise, will perform an ‘rglob’ (recursive) operation.

Returns:

dict[Path, Path]

Submodules#

miranda.structure._structure module#

miranda.structure._structure.build_path_from_schema(facets: dict, output_folder: str | PathLike, schema: str | PathLike | dict | None = None, top_folder: str = 'datasets', validate: bool = True) Path | None[source]#

Build a filepath based on a valid data schema.

Parameters:
  • facets (dict) – Facets for a given dataset.

  • output_folder (str or os.PathLike) – Parent folder on which to extend the filetree structure.

  • schema (str or os.PathLike, optional) – Path to YAML schematic of database structure. If None, will use Ouranos schema.

  • top_folder (str) – Top-level of supplied schema, used for validation purposes. Default: “datasets”.

  • validate (bool) – Run facets-validation checks over given file. Default: True.

Returns:

Path or None

miranda.structure._structure.create_version_hash_files(input_files: str | PathLike | list[str | PathLike] | generator | None = None, facet_dict: dict | None = None, verify_hash: bool = False) None[source]#

Create version hashes based on files or a facets dictionary.

Parameters:
  • input_files (str, os.PathLike, list of str or os.PathLike, or GeneratorType)

  • facet_dict (dict, optional)

  • verify_hash (bool)

Returns:

None

miranda.structure._structure.structure_datasets(input_files: str | PathLike | list[str | PathLike] | generator, output_folder: str | PathLike, *, project: str | None = None, guess: bool = True, dry_run: bool = False, method: str = 'copy', make_dirs: bool = False, set_version_hashes: bool = False, verify_hashes: bool = False, suffix: str = 'nc') dict[Path, Path][source]#
Parameters:
  • input_files (str, Path, list of str or Path, or GeneratorType) – Files to be sorted.

  • output_folder (str or Path) – The desired location for the folder-tree.

  • project ({“cordex”, “cmip5”, “cmip6”, “isimip-ft”, “pcic-candcs-u6”, “converted”}, optional) – Project used to parse the facets of all supplied datasets. If not supplied, will attempt parsing with all available data categories for each file (slow) unless guess is True.

  • guess (bool) – If project not supplied, suggest to decoder that activity is the same for all input_files. Default: True.

  • dry_run (bool) – Prints changes that would have been made without performing them. Default: False.

  • method ({“move”, “copy”}) – Method to transfer files to intended location. Default: “move”.

  • make_dirs (bool) – Make folder tree if it does not already exist. Default: False.

  • set_version_hashes (bool) – Make an accompanying file with version in filename and sha256sum in contents. Default: False.

  • verify_hashes (bool) – Ensure that any existing she256sum files correspond with companion file. Raise on error. Default: False.

  • suffix ({“nc”, “zarr”}) – If “zarr”, will perform a ‘glob’ with provided pattern. Otherwise, will perform an ‘rglob’ (recursive) operation.

Returns:

dict[Path, Path]