Expand description
Schema inference for Zarr v2 and v3 stores
§Assumptions
This module assumes a specific Zarr store structure:
-
Coordinates are 1D arrays: Any array with
shape.len() == 1is treated as a coordinate. Examples:time(7),lat(10),lon(10) -
Data variables are nD arrays: Arrays with
shape.len() > 1are treated as data variables. Their dimensionality must equal the number of coordinate arrays. -
Cartesian product structure: Data variables are assumed to be the Cartesian product of all coordinates. For coordinates
[time(7), lat(10), lon(10)], data variables must have shape[7, 10, 10](i.e.,time × lat × lon). -
Dimension ordering: Coordinates are inferred to match the Zarr arrays’ native dimension ordering when possible (by matching data variable shapes to coordinate sizes). If the ordering cannot be inferred unambiguously, we fall back to alphabetical ordering.
§Example
weather.zarr/
├── time/ shape: [7] → coordinate
├── lat/ shape: [10] → coordinate
├── lon/ shape: [10] → coordinate
├── temperature/ shape: [7, 10, 10] → data variable (time × lat × lon)
└── humidity/ shape: [7, 10, 10] → data variable (time × lat × lon)Structs§
- Zarr
Array Meta - Zarr
Store Meta - Discovered Zarr store structure
Enums§
- Zarr
Version - Zarr format version
Functions§
- detect_
zarr_ version - Detect Zarr version by checking metadata files
- detect_
zarr_ version_ async - Async version of detect_zarr_version for remote object stores
- discover_
arrays - Discover all arrays in a Zarr store (v2 or v3)
- discover_
arrays_ async - Async version of discover_arrays for remote object stores
- infer_
schema - Infer Arrow schema from Zarr store metadata (v2 or v3) Coordinates use DictionaryArray for memory efficiency (stores unique values once)
- infer_
schema_ async - Async version of infer_schema for remote object stores
- infer_
schema_ with_ meta - Infer Arrow schema and return the store metadata for statistics This allows caching the metadata for later use during query execution
- infer_
schema_ with_ meta_ async - Async version of infer_schema that also returns the store metadata This allows caching the metadata for later use during query execution