Crate zarrs

Crate zarrs 

Source
Expand description

zarrs is Rust library for the Zarr storage format for multidimensional arrays and metadata.

If you are a Python user, check out zarrs-python. It includes a high-performance codec pipeline for the reference zarr-python implementation.

zarrs supports Zarr V3 and a V3 compatible subset of Zarr V2. It is fully up-to-date and conformant with the Zarr 3.1 specification with support for:

  • all core extensions (data types, codecs, chunk grids, chunk key encodings, storage transformers),
  • all accepted Zarr Enhancement Proposals (ZEPs) and several draft ZEPs:
    • ZEP 0003: Variable chunking
    • ZEP 0007: Strings
    • ZEP 0009: Zarr Extension Naming
  • various registered extensions from zarr-developers/zarr-extensions/,
  • experimental codecs intended for future registration, and
  • user-defined custom extensions and stores.

A changelog can be found here. Correctness issues with past versions are detailed here.

Developed at the Department of Materials Physics, Australian National University, Canberra, Australia.

Β§Getting Started

Β§Zarr Version Support

zarrs has first-class Zarr V3 support and additionally supports a compatible subset of Zarr V2 data that:

  • can be converted to V3 with only a metadata change, and
  • uses array metadata that is recognised and supported for encoding/decoding.

zarrs supports forward conversion from Zarr V2 to V3. See Converting Zarr V2 to V3 in The zarrs Book, or try the zarrs_reencode CLI tool.

Β§Array Extension Support

Extensions are grouped into three categories:

  • Core: defined in the Zarr V3 specification and are fully supported.
  • Registered: specified at https://github.com/zarr-developers/zarr-extensions/
    • Registered extensions listed in the below tables are fully supported unless otherwise indicated.
  • Experimental: indicated by 🚧 in the tables below and recommended for evaluation only.
    • Experimental extensions are either pending registration or have no formal specification outside of the zarrs docs.
    • Experimental extensions may be unrecognised or incompatible with other Zarr implementations.
    • Experimental extensions may change in future releases without maintaining backwards compatibility.
  • Deprecated: indicated by strikethrough in the tables below
    • Deprecated aliases will not be removed, but are not recommended for use in new arrays.
    • Deprecated extensions may be removed in future releases.

Extension names and aliases are configurable with Config::codec_aliases_v3_mut and similar methods for data types and Zarr V2. zarrs will persist extension names if opening an existing array of creating an array from metadata.

Β§Data Types
DataTypeV3 data_type nameV2 dtypeElementOwned / Element
(Feature Flag)
Boolbool|b1bool
Int2int2i8
Int4int4i8
Int8int8|i1i8
Int16int16>i2 <i2i16
Int32int32>i4 <i4i32
Int64int64>i8 <i8i64
UInt2uint2u8
UInt4uint4u8
UInt8uint8|u1u8
UInt16uint16>u2 <u2u16
UInt32uint32>u4 <u4u32
UInt64uint64>u8 <u8u64
Float4E2M1FN†float4_e2m1fn
Float6E2M3FN†float6_e2m3fn
Float6E3M2FN†float6_e3m2fn
Float8E3M4†float8_e3m4
Float8E4M3†float8_e4m3float8::F8E4M3 (float8)
Float8E4M3B11FNUZ†float8_e4m3b11fnuz
Float8E4M3FNUZ†float8_e4m3fnuz
Float8E5M2†float8_e5m2float8::F8E5M2 (float8)
Float8E5M2FNUZ†float8_e5m2fnuz
Float8E8M0FNU†float8_e8m0fnu
BFloat16bfloat16half::bf16
Float16float16>f2 <f2half::f16
Float32float32>f4 <f4f32
Float64float64>f8 <f8f64
Complex64complex64>c8 <c8Complex<f32>
Complex128complex128>c16 <c16Complex<f64>
ComplexBFloat16complex_bfloat16Complex<half::bf16>
ComplexFloat16complex_float16Complex<half::f16>
ComplexFloat32complex_float32Complex<f32>
ComplexFloat64complex_float64Complex<f64>
ComplexFloat4E2M1FN†complex_float4_e2m1fn
ComplexFloat6E2M3FN†complex_float6_e2m3fn
ComplexFloat6E3M2FN†complex_float6_e3m2fn
ComplexFloat8E3M4†complex_float8_e3m4
ComplexFloat8E4M3†complex_float8_e4m3Complex<float8::F8E4M3> (float8)
ComplexFloat8E4M3B11FNUZ†complex_float8_e4m3b11fnuz
ComplexFloat8E4M3FNUZ†complex_float8_e4m3fnuz
ComplexFloat8E5M2†complex_float8_e5m2Complex<float8::F8E5M2> (float8)
ComplexFloat8E5M2FNUZ†complex_float8_e5m2fnuz
ComplexFloat8E8M0FNU†complex_float8_e8m0fnu
RawBitsr*[u8; N] / &[u8; N]
Stringstring|OString / &str
Bytesbytes
binary
🚧variable_length_bytes
|VXVec<u8> / &[u8]
NumpyDateTime64numpy.datetime64i64
chrono::DateTime<Utc> (chrono)
jiff::Timestamp (jiff)
NumpyTimeDelta64numpy.timedelta64i64
chrono::TimeDelta (chrono)
jiff::SignedDuration (jiff)

† Additional features (e.g. float8) may be required to parse floating point fill values. All subfloat types support hex string fill values.

Β§Codecs
Codec TypeV3 nameV2 idFeature Flag*
Array to Arraytransposetransposetranspose
🚧reshape-
🚧numcodecs.fixedscaleoffsetfixedscaleoffset
bitroundbitroundbitround
🚧zarrs.squeeze-
Array to Bytesbytes-
sharding_indexed-sharding
🚧vlen-arrayvlen-array
vlen-bytesvlen-bytes
vlen-utf8vlen-utf8
packbitspackbits
🚧numcodecs.pcodecpcodecpcodec
🚧numcodecs.zfpyzfpyzfp
🚧zarrs.vlen-
🚧zarrs.vlen_v2-
zfp-zfp
Bytes to Bytesbloscbloscblosc
crc32ccrc32ccrc32c
gzipgzipgzip
zstdzstdzstd
🚧numcodecs.adler32adler32adler32
🚧numcodecs.bz2bz2bz2
🚧numcodecs.fletcher32fletcher32fletcher32
🚧numcodecs.shuffleshuffle
🚧numcodecs.zlibzlibzlib
🚧zarrs.gdeflate-gdeflate

* Bolded feature flags are part of the default set of features.

zarrs supports arrays created with zarr-python 3.0.0+ and numcodecs 0.15.1+ with various numcodecs.zarr3 codecs.

Β§Chunk Grids
Chunk GridZEPV3V2Feature Flag
regularZEP0001βœ“βœ“
🚧rectangularZEP0003 (draft)βœ“
🚧zarrs.regular_boundedβœ“
Β§Chunk Key Encodings
Chunk Key EncodingZEPV3V2Feature Flag
defaultZEP0001βœ“
v2ZEP0001βœ“βœ“
🚧zarrs.default_suffixβœ“
Β§Storage Transformers

Zarr V3 does not currently define any storage transformers.

Β§Storage Support

zarrs supports a huge range of stores (including custom stores) via the zarrs_storage API.

Store/Storage AdapterZEPReadWriteListSyncAsyncCrate
MemoryStoreβœ“βœ“βœ“βœ“zarrs_storage†
FilesystemStore0001βœ“βœ“βœ“βœ“zarrs_filesystem‑
AsyncOpendalStoreβœ“*βœ“*βœ“*βœ“zarrs_opendal
AsyncObjectStoreβœ“*βœ“*βœ“*βœ“zarrs_object_store
AsyncIcechunkStoreβœ“*βœ“*βœ“*βœ“zarrs_icechunk
HTTPStoreβœ“βœ“zarrs_http
AsyncToSyncStorageAdapterβœ“βœ“βœ“βœ“zarrs_storage†
SyncToAsyncStorageAdapterβœ“βœ“βœ“βœ“zarrs_storage†
UsageLogStorageAdapterβœ“βœ“βœ“βœ“βœ“zarrs_storage†
PerformanceMetricsStorageAdapterβœ“βœ“βœ“βœ“βœ“zarrs_storage†
ZipStorageAdapterβœ“βœ“βœ“zarrs_zip

† Re-exported in the zarrs::storage module.
‑ Re-exported as the zarrs::filesystem module.
* Support depends on the underlying store.

The opendal and object_store crates are popular Rust storage backends that are fully supported via zarrs_opendal and zarrs_object_store. These backends provide more feature complete HTTP stores than zarrs_http.

zarrs_icechunk implements the Icechunk transactional storage engine, a storage specification for Zarr that supports object_store stores.

The AsyncToSyncStorageAdapter enables some async stores to be used in a sync context.

Β§Logging

zarrs logs information and warnings using the log crate. A logging implementation must be enabled to capture logs. See the log crate documentation for more details.

Β§Examples

Β§Create and Read a Zarr Hierarchy


// Create a filesystem store
let store_path: PathBuf = "/path/to/hierarchy.zarr".into();
let store: zarrs::storage::ReadableWritableListableStorage = Arc::new(
    // zarrs::filesystem requires the filesystem feature
    zarrs::filesystem::FilesystemStore::new(&store_path)?
);

// Write the root group metadata
zarrs::group::GroupBuilder::new()
    .build(store.clone(), "/")?
    // .attributes(...)
    .store_metadata()?;

// Create a new sharded V3 array using the array builder
let array = zarrs::array::ArrayBuilder::new(
    vec![3, 4], // array shape
    vec![2, 2], // regular chunk (shard) shape
    zarrs::array::DataType::Float32,
    0.0f32, // fill value
)
.array_to_bytes_codec(Arc::new(
    // The sharding codec requires the sharding feature
    zarrs::array::codec::ShardingCodecBuilder::new(
        [2, 1].try_into()? // inner chunk shape
    )
    .bytes_to_bytes_codecs(vec![
        // GzipCodec requires the gzip feature
        Arc::new(zarrs::array::codec::GzipCodec::new(5)?),
    ])
    .build()
))
.dimension_names(["y", "x"].into())
.attributes(serde_json::json!({"Zarr V3": "is great"}).as_object().unwrap().clone())
.build(store.clone(), "/array")?; // /path/to/hierarchy.zarr/array

// Store the array metadata
array.store_metadata()?;
println!("{}", serde_json::to_string_pretty(array.metadata())?);
// {
//     "zarr_format": 3,
//     "node_type": "array",
//     ...
// }

// Perform some write operations on the chunks
array.store_chunk_elements::<f32>(
    &[0, 1], // chunk index
    &[0.2, 0.3, 1.2, 1.3]
)?;
array.store_array_subset_ndarray::<f32, _>(
    &[1, 1], // array index (start of subset)
    ndarray::array![[-1.1, -1.2], [-2.1, -2.2]]
)?;
array.erase_chunk(&[1, 1])?;

// Retrieve all array elements as an ndarray
let array_all = array.retrieve_array_subset_ndarray::<f32>(&array.subset_all())?;
println!("{array_all:4}");
// [[ NaN,  NaN,  0.2,  0.3],
//  [ NaN, -1.1, -1.2,  1.3],
//  [ NaN, -2.1,  NaN,  NaN]]

// Retrieve a chunk directly
let array_chunk = array.retrieve_chunk_ndarray::<f32>(
    &[0, 1], // chunk index
)?;
println!("{array_chunk:4}");
// [[  0.2,  0.3],
//  [ -1.2,  1.3]]

// Retrieve an inner chunk
use zarrs::array::ArrayShardedReadableExt;
let shard_index_cache = zarrs::array::ArrayShardedReadableExtCache::new(&array);
let array_inner_chunk = array.retrieve_inner_chunk_ndarray_opt::<f32>(
    &shard_index_cache,
    &[0, 3], // inner chunk index
    &zarrs::array::codec::CodecOptions::default(),
)?;
println!("{array_inner_chunk:4}");
// [[ 0.3],
//  [ 1.3]]

Β§Additional Examples

Various examples can be found in the examples/ directory of the zarrs repository that demonstrate:

  • creating and manipulating zarr hierarchies with various stores (sync and async), codecs, etc,
  • converting between Zarr V2 and V3, and
  • creating custom data types.

Examples can be run with cargo run --example <EXAMPLE_NAME>.

  • Some examples require non-default features, which can be enabled with --all-features or --features <FEATURES>.
  • Some examples support a -- --usage-log argument to print storage API calls during execution.

Β§Crate Features

Β§Default
Β§Non-Default
  • async: an experimental asynchronous API for stores, Array, and Group.
    • The async API is runtime-agnostic. This has some limitations that are detailed in the Array docs.
    • The async API is not as performant as the sync API.
  • Codecs: adler32, bitround, bz2, fletcher32, gdeflate, pcodec, zfp, zlib.
  • dlpack: adds convenience methods for DLPack tensor interop to Array.
  • Additional Element/ElementOwned implementations:
    • float8: add support for float8 subfloat data types.
    • jiff: add support for jiff time data types.
    • chrono: add support for chrono time data types.

Β§zarrs Ecosystem

The Zarr specification is inherently unstable. It is under active development and new extensions are continually being introduced.

The zarrs crate has been split into multiple crates to:

  • allow external implementations of stores and extensions points to target a relatively stable API compatible with a range of zarrs versions,
  • enable automatic backporting of metadata compatibility fixes and changes due to standardisation,
  • stay up-to-date with unstable public dependencies (e.g. opendal, object_store, icechunk, etc) without impacting the release cycle of zarrs, and
  • improve compilation times.

A hierarchical overview of these crates can be found in the The zarrs Book.

Β§Core
  • zarrs: The core library for manipulating Zarr hierarchies.
  • zarrs_metadata: Zarr metadata support (re-exported as zarrs::metadata).
  • zarrs_metadata_ext: Zarr extensions metadata support (re-exported as zarrs::metadata_ext).
  • zarrs_data_type: The data type extension API for zarrs (re-exported in zarrs::array::data_type).
  • zarrs_storage: The storage API for zarrs (re-exported as zarrs::storage).
  • zarrs_plugin: The plugin API for zarrs (re-exported as zarrs::plugin).
  • zarrs_registry: The Zarr extension point registry for zarrs (re-exported as zarrs::registry).
Β§Stores
Β§Bindings
Β§Zarr Metadata Conventions
Β§Tools
  • zarrs_tools: Various tools for creating and manipulating Zarr V3 data with the zarrs rust crate
    • A reencoder that can change codecs, chunk shape, convert Zarr V2 to V3, etc.
    • Create an OME-Zarr hierarchy from a Zarr array.
    • Transform arrays: crop, rescale, downsample, gradient magnitude, gaussian, noise filtering, etc.

Β§Benchmarks

  • zarr_benchmarks: Benchmarks of various Zarr V3 implementations: zarrs, zarr-python, tensorstore

Β§Licence

zarrs is licensed under either of

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Re-exportsΒ§

pub use zarrs_metadata as metadata;
pub use zarrs_metadata_ext as metadata_ext;
pub use zarrs_plugin as plugin;
pub use zarrs_registry as registry;
pub use zarrs_storage as storage;
pub use zarrs_filesystem as filesystem;filesystem

ModulesΒ§

array
Zarr arrays.
array_subset
Array subsets.
config
zarrs global configuration options.
group
Zarr groups.
hierarchy
Zarr hierarchies.
indexer
Generic indexer support.
node
Zarr nodes.
version
zarrs version information.