Module schema_inference

Module schema_inference 

Source
Expand description

Schema inference for Zarr v2 and v3 stores

§Assumptions

This module assumes a specific Zarr store structure:

  1. Coordinates are 1D arrays: Any array with shape.len() == 1 is treated as a coordinate. Examples: time(7), lat(10), lon(10)

  2. Data variables are nD arrays: Arrays with shape.len() > 1 are treated as data variables. Their dimensionality must equal the number of coordinate arrays.

  3. Cartesian product structure: Data variables are assumed to be the Cartesian product of all coordinates. For coordinates [time(7), lat(10), lon(10)], data variables must have shape [7, 10, 10] (i.e., time × lat × lon).

  4. Dimension ordering: Coordinates are inferred to match the Zarr arrays’ native dimension ordering when possible (by matching data variable shapes to coordinate sizes). If the ordering cannot be inferred unambiguously, we fall back to alphabetical ordering.

§Example

weather.zarr/
├── time/       shape: [7]           → coordinate
├── lat/        shape: [10]          → coordinate
├── lon/        shape: [10]          → coordinate
├── temperature/ shape: [7, 10, 10]  → data variable (time × lat × lon)
└── humidity/    shape: [7, 10, 10]  → data variable (time × lat × lon)

Structs§

ZarrArrayMeta
ZarrStoreMeta
Discovered Zarr store structure

Enums§

ZarrVersion
Zarr format version

Functions§

detect_zarr_version
Detect Zarr version by checking metadata files
detect_zarr_version_async
Async version of detect_zarr_version for remote object stores
discover_arrays
Discover all arrays in a Zarr store (v2 or v3)
discover_arrays_async
Async version of discover_arrays for remote object stores
infer_schema
Infer Arrow schema from Zarr store metadata (v2 or v3) Coordinates use DictionaryArray for memory efficiency (stores unique values once)
infer_schema_async
Async version of infer_schema for remote object stores
infer_schema_with_meta
Infer Arrow schema and return the store metadata for statistics This allows caching the metadata for later use during query execution
infer_schema_with_meta_async
Async version of infer_schema that also returns the store metadata This allows caching the metadata for later use during query execution