miranda package

Python utilities for climate data collection, conversion, and management.

Subpackages

Submodules

miranda.cv module

ESGF Controlled Vocabulary module.

miranda.scripting module

Scripting Helpers module.

miranda.storage module

Disk space management.

Classes:
  • DiskSpaceError - the exception raised on failure.

  • FileMeta - file and its size.

  • StorageState - storage capacity and availability of a medium.

Functions:
  • total_size() - get total size of a list of files.

  • size_division() - divide files based on number and size restrictions.

exception miranda.storage.DiskSpaceError[source]

Bases: Exception

DiskSpaceError Exception.

class miranda.storage.FileMeta(path: str, size: int = -1)[source]

Bases: object

File path and size.

Parameters:
  • path (str) – The full path of the file.

  • size (int) – The size of file in bytes.

class miranda.storage.StorageState(base_path, capacity=-1, used_space=-1, free_space=-1)[source]

Bases: object

Information regarding the storage capacity of a disk.

Parameters:
  • base_path (Path) – The base path of the storage medium.

  • capacity (int) – Capacity of medium in bytes.

  • used_space (int) – Space currently used on the medium.

  • free_space (int) – Space available on the medium.

miranda.storage.file_size(file_path_or_bytes_or_dict: Path | str | int | list[str | Path] | GeneratorType | dict[str, Path | list[Path]]) int[source]

Return size of object in bytes.

Parameters:

file_path_or_bytes_or_dict (Path or str or int, list of str or Path, GeneratorType, or dict[str, Path or list of Path]) – The file or object to be evaluated.

Returns:

int – The size of the file or object in bytes.

miranda.storage.report_file_size(file_path_or_bytes_or_dict: Path | str | int | list[str | Path] | GeneratorType | dict[str, Path | list[Path]], use_binary: bool = True, significant_digits: int = 2) str[source]

Report file size in a human-readable format.

This function will parse the contents of a list or generator of files and return the size in bytes of a file or a list of files in pretty formatted text.

Parameters:
  • file_path_or_bytes_or_dict (Path or str or int, list of str or Path, GeneratorType, or dict[str, Path or list of Path]) – The file or object to be evaluated.

  • use_binary (bool) – Flag to use binary conversion (default: True).

  • significant_digits (int) – Number of significant digits to display (default: 2).

Returns:

str – The file size in a human-readable format.

miranda.storage.size_division(files_to_divide: list | FileMeta | Path, size_limit: int = 0, file_limit: int = 0, check_name_repetition: bool = False, preserve_order: bool = False) list[list][source]

Divide files according to size and number limits.

Parameters:
  • files_to_divide (list of str or Path or FileMeta) – Files to be sorted.

  • size_limit (int) – Size limit of divisions in bytes. Default: 0 (no limit).

  • file_limit (int) – Number of files limit of divisions. Default: 0 (no limit).

  • check_name_repetition (bool) – Flag to prevent file name repetitions. Default: False.

  • preserve_order (bool) – Flag to force files to be restored in the order they are given. Default: False.

Returns:

list[list] – The list of divisions (each division is a list of FileMeta objects).

miranda.storage.size_evaluation(file_list: list[str | FileMeta | Path]) int[source]

Total size of files.

Parameters:

file_list (list of str or Path or FileMeta) – List of files to evaluate.

Returns:

int – The total size of files in bytes.

miranda.units module

Special Time Units-Handling submodule.

miranda.units.check_time_frequency(d: Dataset, expected_period: str | None = None, minimum_continuous_period: str = '1M') tuple[list[int | str], str][source]

Try to understand the Dataset frequency.

If it can’t be inferred with xarray.infer_freq() it tries to: - look for a “freq” attrs in the global or time variable attributes. - infer monthly frequency if all time steps are between 27 and 32 days

In the event that an expected_period is supplied, special handling will be called allowing for determining data that may be internally discontinuous (e.g. discontinuous overall, but continuous for minimum_continuous_period). This is provided for instances where input data in a multifile dataset is sparse.

Parameters:
  • d (xr.Dataset) – An xarray.Dataset.

  • expected_period (str) – An xarray-compatible time period (e.g. “1H”, “1D”, “7D”, “1M”, “1A”). The time period expected of the input dataset. The “1M” period is specially-handled.

  • minimum_continuous_period (str) – An xarray-compatible time period (e.g. “1H”, “1D”, “7D”, “1M”, “1A”). The minimum expected granular period that data should have continuous values for. The “1M” period is specially-handled.

Returns:

  • offset (list of int or str) – The offset a list of (multiplier, base).

  • offset_meaning (str) – The offset meaning (single word).

miranda.utils module

Miscellaneous Helper Utilities module.

class miranda.utils.HiddenPrints[source]

Bases: object

Special context manager for hiding print statements.

Notes

Solution from https://stackoverflow.com/a/45669280/7322852 Credit to Alexander C (https://stackoverflow.com/users/2039471/alexander-c) CC-BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)-

miranda.utils.chunk_iterables(iterable: Sequence, chunk_size: int) Iterable[source]

Generate lists of chunk_size elements from iterable.

Parameters:
  • iterable (Sequence) – The iterable to chunk.

  • chunk_size (int) – The size of the chunks.

Yields:

Iterable – The chunked iterable.

Notes

Adapted from eidord (2012) https://stackoverflow.com/a/12797249/7322852 (https://creativecommons.org/licenses/by-sa/4.0/)

miranda.utils.generic_extract_archive(resources: str | Path | list[bytes | str | Path], output_dir: str | Path | None = None) list[Path][source]

Extract archives (tar/zip) to a working directory.

Parameters:
  • resources (str or Path or list of bytes or str or Path) – List of archive files (if netCDF files are in list, they are passed and returned as well in the return).

  • output_dir (str or Path, optional) – String or Path to a working location (default: temporary folder).

Returns:

list – The list of original or of extracted files.

miranda.utils.list_paths_with_elements(base_paths: str | list[str] | PathLike[str], elements: list[str]) list[dict][source]

List a given path structure.

Parameters:
  • base_paths (str or list of str or os.PathLike) – List of paths from which to start the search.

  • elements (list of str) – Ordered list of the expected elements.

Returns:

list of dict – The keys are ‘path’ and each of the members of the given elements, the path is the absolute path.

Notes

Suppose you have the following structure: /base_path/{color}/{shape} The resulting list would look like:

[{'path':/base_path/red/square, 'color':'red', 'shape':'square'},
{'path':/base_path/red/circle, 'color':'red', 'shape':'circle'},
{'path':/base_path/blue/triangle, 'color':'blue', 'shape':'triangle'},
...]

Obviously, ‘path’ should not be in the input list of elements.

miranda.utils.single_item_list(iterable: Iterable) bool[source]

Ascertain whether a list has exactly one entry.

See: https://stackoverflow.com/a/16801605/7322852

Parameters:

iterable (Iterable) – The list to check.

Returns:

bool – Whether the list is a single item.

miranda.utils.working_directory(directory: str | Path) None[source]

Change the working directory within a context object.

This function momentarily changes the working directory within the context and reverts to the file working directory when the code block it is acting upon exits

Parameters:

directory (str or pathlib.Path) – The directory to temporarily change to.