miranda package¶

Python utilities for climate data collection, conversion, and management.

Subpackages¶

Submodules¶

miranda.cv module¶

ESGF Controlled Vocabulary module.

miranda.scripting module¶

Scripting Helpers module.

miranda.storage module¶

Disk space management.

Classes:

DiskSpaceError - the exception raised on failure.
FileMeta - file and its size.
StorageState - storage capacity and availability of a medium.

Functions:

total_size() - get total size of a list of files.
size_division() - divide files based on number and size restrictions.

exception miranda.storage.DiskSpaceError[source]¶

Bases: Exception

DiskSpaceError Exception.

class miranda.storage.FileMeta(path: str, size: int = -1)[source]¶

Bases: object

File path and size.

Parameters:

path (str) – The full path of the file.
size (int) – The size of file in bytes.

class miranda.storage.StorageState(base_path, capacity=-1, used_space=-1, free_space=-1)[source]¶

Bases: object

Information regarding the storage capacity of a disk.

Parameters:

base_path (Path) – The base path of the storage medium.
capacity (int) – Capacity of medium in bytes.
used_space (int) – Space currently used on the medium.
free_space (int) – Space available on the medium.

Return size of object in bytes.

Parameters:: file_path_or_bytes_or_dict (Path or str or int, list of str or Path, GeneratorType, or dict[str, Path or list of Path]) – The file or object to be evaluated.
Returns:: int – The size of the file or object in bytes.

Report file size in a human-readable format.

This function will parse the contents of a list or generator of files and return the size in bytes of a file or a list of files in pretty formatted text.

Parameters:

file_path_or_bytes_or_dict (Path or str or int, list of str or Path, GeneratorType, or dict[str, Path or list of Path]) – The file or object to be evaluated.
use_binary (bool) – Flag to use binary conversion (default: True).
significant_digits (int) – Number of significant digits to display (default: 2).

Returns:

str – The file size in a human-readable format.

miranda.storage.size_division(files_to_divide: list | FileMeta | Path, size_limit: int = 0, file_limit: int = 0, check_name_repetition: bool = False, preserve_order: bool = False) → list[list][source]¶

Divide files according to size and number limits.

Parameters:

files_to_divide (list of str or Path or FileMeta) – Files to be sorted.
size_limit (int) – Size limit of divisions in bytes. Default: 0 (no limit).
file_limit (int) – Number of files limit of divisions. Default: 0 (no limit).
check_name_repetition (bool) – Flag to prevent file name repetitions. Default: False.
preserve_order (bool) – Flag to force files to be restored in the order they are given. Default: False.

Returns:

list[list] – The list of divisions (each division is a list of FileMeta objects).

miranda.storage.size_evaluation(file_list: list[str | FileMeta | Path]) → int[source]¶

Total size of files.

Parameters:: file_list (list of str or Path or FileMeta) – List of files to evaluate.
Returns:: int – The total size of files in bytes.

miranda.units module¶

Special Time Units-Handling submodule.

miranda.units.check_time_frequency(d: Dataset, expected_period: str | None = None, minimum_continuous_period: str = '1M') → tuple[list[int | str], str][source]¶

Try to understand the Dataset frequency.

If it can’t be inferred with xarray.infer_freq() it tries to: - look for a “freq” attrs in the global or time variable attributes. - infer monthly frequency if all time steps are between 27 and 32 days

In the event that an expected_period is supplied, special handling will be called allowing for determining data that may be internally discontinuous (e.g. discontinuous overall, but continuous for minimum_continuous_period). This is provided for instances where input data in a multifile dataset is sparse.

Parameters:

d (xr.Dataset) – An xarray.Dataset.
expected_period (str) – An xarray-compatible time period (e.g. “1H”, “1D”, “7D”, “1M”, “1A”). The time period expected of the input dataset. The “1M” period is specially-handled.
minimum_continuous_period (str) – An xarray-compatible time period (e.g. “1H”, “1D”, “7D”, “1M”, “1A”). The minimum expected granular period that data should have continuous values for. The “1M” period is specially-handled.

Returns:

offset (list of int or str) – The offset a list of (multiplier, base).
offset_meaning (str) – The offset meaning (single word).

miranda.utils module¶

Miscellaneous Helper Utilities module.

class miranda.utils.HiddenPrints[source]¶

Bases: object

Special context manager for hiding print statements.

Notes

Solution from https://stackoverflow.com/a/45669280/7322852 Credit to Alexander C (https://stackoverflow.com/users/2039471/alexander-c) CC-BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/)-

miranda.utils.chunk_iterables(iterable: Sequence, chunk_size: int) → Iterable[source]¶

Generate lists of chunk_size elements from iterable.

Parameters:

iterable (Sequence) – The iterable to chunk.
chunk_size (int) – The size of the chunks.

Yields:

Iterable – The chunked iterable.

Notes

Adapted from eidord (2012) https://stackoverflow.com/a/12797249/7322852 (https://creativecommons.org/licenses/by-sa/4.0/)

Extract archives (tar/zip) to a working directory.

Parameters:

resources (str or Path or list of bytes or str or Path) – List of archive files (if netCDF files are in list, they are passed and returned as well in the return).
output_dir (str or Path, optional) – String or Path to a working location (default: temporary folder).

Returns:

list – The list of original or of extracted files.

miranda.utils.list_paths_with_elements(base_paths: str | list[str] | PathLike[str], elements: list[str]) → list[dict][source]¶

List a given path structure.

Parameters:

base_paths (str or list of str or os.PathLike) – List of paths from which to start the search.
elements (list of str) – Ordered list of the expected elements.

Returns:

list of dict – The keys are ‘path’ and each of the members of the given elements, the path is the absolute path.

Notes

Suppose you have the following structure: /base_path/{color}/{shape} The resulting list would look like:

[{'path':/base_path/red/square, 'color':'red', 'shape':'square'},
{'path':/base_path/red/circle, 'color':'red', 'shape':'circle'},
{'path':/base_path/blue/triangle, 'color':'blue', 'shape':'triangle'},
...]

Obviously, ‘path’ should not be in the input list of elements.

miranda.utils.single_item_list(iterable: Iterable) → bool[source]¶

Ascertain whether a list has exactly one entry.

See: https://stackoverflow.com/a/16801605/7322852

Parameters:: iterable (Iterable) – The list to check.
Returns:: bool – Whether the list is a single item.

miranda.utils.working_directory(directory: str | Path) → None[source]¶

Change the working directory within a context object.

This function momentarily changes the working directory within the context and reverts to the file working directory when the code block it is acting upon exits

Parameters:: directory (str or pathlib.Path) – The directory to temporarily change to.