miranda.structure package#
Data Structure module.
- miranda.structure.build_path_from_schema(facets: dict, output_folder: str | PathLike, schema: str | PathLike | dict | None = None, top_folder: str = 'datasets', validate: bool = True) Path | None [source]#
Build a filepath based on a valid data schema.
- Parameters:
facets (dict) – Facets for a given dataset.
output_folder (str or os.PathLike) – Parent folder on which to extend the filetree structure.
schema (str or os.PathLike, optional) – Path to YAML schematic of database structure. If None, will use Ouranos schema.
top_folder (str) – Top-level of supplied schema, used for validation purposes. Default: “datasets”.
validate (bool) – Run facets-validation checks over given file. Default: True.
- Returns:
Path or None
- miranda.structure.create_version_hash_files(input_files: str | PathLike | list[str | PathLike] | generator | None = None, facet_dict: dict | None = None, verify_hash: bool = False) None [source]#
Create version hashes based on files or a facets dictionary.
- Parameters:
input_files (str, os.PathLike, list of str or os.PathLike, or GeneratorType)
facet_dict (dict, optional)
verify_hash (bool)
- Returns:
None
- miranda.structure.structure_datasets(input_files: str | PathLike | list[str | PathLike] | generator, output_folder: str | PathLike, *, project: str | None = None, guess: bool = True, dry_run: bool = False, method: str = 'copy', make_dirs: bool = False, set_version_hashes: bool = False, verify_hashes: bool = False, suffix: str = 'nc') dict[Path, Path] [source]#
- Parameters:
input_files (str, Path, list of str or Path, or GeneratorType) – Files to be sorted.
output_folder (str or Path) – The desired location for the folder-tree.
project ({“cordex”, “cmip5”, “cmip6”, “isimip-ft”, “pcic-candcs-u6”, “converted”}, optional) – Project used to parse the facets of all supplied datasets. If not supplied, will attempt parsing with all available data categories for each file (slow) unless guess is True.
guess (bool) – If project not supplied, suggest to decoder that activity is the same for all input_files. Default: True.
dry_run (bool) – Prints changes that would have been made without performing them. Default: False.
method ({“move”, “copy”}) – Method to transfer files to intended location. Default: “move”.
make_dirs (bool) – Make folder tree if it does not already exist. Default: False.
set_version_hashes (bool) – Make an accompanying file with version in filename and sha256sum in contents. Default: False.
verify_hashes (bool) – Ensure that any existing she256sum files correspond with companion file. Raise on error. Default: False.
suffix ({“nc”, “zarr”}) – If “zarr”, will perform a ‘glob’ with provided pattern. Otherwise, will perform an ‘rglob’ (recursive) operation.
- Returns:
dict[Path, Path]
Submodules#
miranda.structure._structure module#
- miranda.structure._structure.build_path_from_schema(facets: dict, output_folder: str | PathLike, schema: str | PathLike | dict | None = None, top_folder: str = 'datasets', validate: bool = True) Path | None [source]#
Build a filepath based on a valid data schema.
- Parameters:
facets (dict) – Facets for a given dataset.
output_folder (str or os.PathLike) – Parent folder on which to extend the filetree structure.
schema (str or os.PathLike, optional) – Path to YAML schematic of database structure. If None, will use Ouranos schema.
top_folder (str) – Top-level of supplied schema, used for validation purposes. Default: “datasets”.
validate (bool) – Run facets-validation checks over given file. Default: True.
- Returns:
Path or None
- miranda.structure._structure.create_version_hash_files(input_files: str | PathLike | list[str | PathLike] | generator | None = None, facet_dict: dict | None = None, verify_hash: bool = False) None [source]#
Create version hashes based on files or a facets dictionary.
- Parameters:
input_files (str, os.PathLike, list of str or os.PathLike, or GeneratorType)
facet_dict (dict, optional)
verify_hash (bool)
- Returns:
None
- miranda.structure._structure.structure_datasets(input_files: str | PathLike | list[str | PathLike] | generator, output_folder: str | PathLike, *, project: str | None = None, guess: bool = True, dry_run: bool = False, method: str = 'copy', make_dirs: bool = False, set_version_hashes: bool = False, verify_hashes: bool = False, suffix: str = 'nc') dict[Path, Path] [source]#
- Parameters:
input_files (str, Path, list of str or Path, or GeneratorType) – Files to be sorted.
output_folder (str or Path) – The desired location for the folder-tree.
project ({“cordex”, “cmip5”, “cmip6”, “isimip-ft”, “pcic-candcs-u6”, “converted”}, optional) – Project used to parse the facets of all supplied datasets. If not supplied, will attempt parsing with all available data categories for each file (slow) unless guess is True.
guess (bool) – If project not supplied, suggest to decoder that activity is the same for all input_files. Default: True.
dry_run (bool) – Prints changes that would have been made without performing them. Default: False.
method ({“move”, “copy”}) – Method to transfer files to intended location. Default: “move”.
make_dirs (bool) – Make folder tree if it does not already exist. Default: False.
set_version_hashes (bool) – Make an accompanying file with version in filename and sha256sum in contents. Default: False.
verify_hashes (bool) – Ensure that any existing she256sum files correspond with companion file. Raise on error. Default: False.
suffix ({“nc”, “zarr”}) – If “zarr”, will perform a ‘glob’ with provided pattern. Otherwise, will perform an ‘rglob’ (recursive) operation.
- Returns:
dict[Path, Path]