Crate zarrs

Source
Expand description

zarrs is Rust library for the Zarr storage format for multidimensional arrays and metadata.

If you are a Python user, check out zarrs-python. It includes a high-performance codec pipeline for the reference zarr-python implementation.

zarrs supports Zarr V3 and a V3 compatible subset of Zarr V2. It is fully up-to-date and conformant with the Zarr 3.1 specification with support for:

  • all core extensions (data types, codecs, chunk grids, chunk key encodings, storage transformers),
  • all accepted Zarr Enhancement Proposals (ZEPs) and several draft ZEPs:
    • ZEP 0003: Variable chunking
    • ZEP 0007: Strings
    • ZEP 0009: Zarr Extension Naming
  • various registered extensions from zarr-developers/zarr-extensions/,
  • experimental codecs and data types intended for future registration, and
  • user-defined custom extensions and stores.

A changelog can be found here. Correctness issues with past versions are detailed here.

Developed at the Department of Materials Physics, Australian National University, Canberra, Australia.

§Getting Started

§Implementation Status

§Zarr Version Support

zarrs has first-class Zarr V3 support and additionally supports a compatible subset of Zarr V2 data that:

  • can be converted to V3 with only a metadata change, and
  • uses array metadata that is recognised and supported for encoding/decoding.

zarrs supports forward conversion from Zarr V2 to V3. See “Converting Zarr V2 to V3” in The zarrs Book, or try the zarrs_reencode CLI tool.

§Array Support
Data Types
DataTypeV3 nameV2 dtypeElementOwned / ElementFeature Flag
Boolbool|b1bool
Int2int2i8
Int4int4i8
Int8int8|i1i8
Int16int16>i2 <i2i16
Int32int32>i4 <i4i32
Int64int64>i8 <i8i64
UInt2uint2u8
UInt4uint4u8
UInt8uint8|u1u8
UInt16uint16>u2 <u2u16
UInt32uint32>u4 <u4u32
UInt64uint64>u8 <u8u64
Float4E2M1FNfloat4_e2m1fn
Float6E2M3FNfloat6_e2m3fn
Float6E3M2FNfloat6_e3m2fn
Float8E3M4float8_e3m4
Float8E4M3float8_e4m3
Float8E4M3B11FNUZfloat8_e4m3b11fnuz
Float8E4M3FNUZfloat8_e4m3fnuz
Float8E5M2float8_e5m2
Float8E5M2FNUZfloat8_e5m2fnuz
Float8E8M0FNUfloat8_e8m0fnu
BFloat16bfloat16half::bf16
Float16float16>f2 <f2half::f16
Float32float32>f4 <f4f32
Float64float64>f8 <f8f64
ComplexBFloat16complex_bfloat16Complex<half::bf16>
ComplexFloat16complex_float16Complex<half::f16>
ComplexFloat32complex_float32Complex<f32>
ComplexFloat64complex_float64Complex<f64>
Complex64complex64>c8 <c8Complex<f32>
Complex128complex128>c16 <c16Complex<f64>
RawBitsr[u8; N] / &[u8; N]
Stringstring|OString / &str
Bytesbytes
binary
|VXVec<u8> / &[u8]
NumpyDateTime64numpy.datetime64i64
chrono::DateTime<Utc>
jiff::Timestamp

chrono
jiff
NumpyTimeDelta64numpy.timedelta64i64
chrono::TimeDelta
jiff::SignedDuration

chrono
jiff
Codecs
Codec TypeDefault codec nameSpecificationFeature Flag*
Array to ArraytransposeZarr V3.0 Transposetranspose
numcodecs.fixedscaleoffsetExperimental
numcodecs.bitroundExperimentalbitround
zarrs.squeezeExperimental
Array to BytesbytesZarr V3.0 Bytes
sharding_indexedZarr V3.0 Shardingsharding
vlen-arrayExperimental
vlen-byteszarr-extensions/codecs/vlen-bytes
vlen-utf8zarr-extensions/codecs/vlen-utf8
numcodecs.pcodecExperimentalpcodec
numcodecs.zfpyExperimentalzfp
packbitszarr-extensions/codecs/packbits
zarrs.vlenExperimental
zarrs.vlen_v2Experimental
zfpzarr-extensions/codecs/zfpzfp
Bytes to BytesbloscZarr V3.0 Bloscblosc
crc32cZarr V3.0 CRC32Ccrc32c
gzipZarr V3.0 Gzipgzip
zstdzarr-extensions/codecs/zstdzstd
numcodecs.bz2Experimentalbz2
numcodecs.fletcher32Experimentalfletcher32
numcodecs.shuffleExperimental
numcodecs.zlibExperimentalzlib
zarrs.gdeflateExperimentalgdeflate

* Bolded feature flags are part of the default set of features. numcodecs.bitround supports additional data types not supported by zarr-python/numcodecs

Codecs have three potential statuses:

Codec names and aliases are configurable with Config::codec_aliases_v3_mut and Config::codec_aliases_v2_mut. zarrs will persist codec names if opening an existing array of creating an array from metadata.

zarrs supports arrays created with zarr-python 3.x.x with various numcodecs.zarr3 codecs. However, arrays must be written with numcodecs 0.15.1+.

Chunk Grids
Chunk GridZEPV3V2Feature Flag
regularZEP0001
rectangular (experimental)ZEP0003 (draft)
Chunk Key Encodings
Chunk Key EncodingZEPV3V2Feature Flag
defaultZEP0001
v2ZEP0001
Storage Transformers

Zarr V3 does not currently define any storage transformers.

§Storage Support

zarrs supports a huge range of stores (including custom stores) via the zarrs_storage API.

Stores

† Re-exported in the zarrs::storage module.
‡ Re-exported as the zarrs::filesystem module.
* Support depends on the underlying store.

The opendal and object_store crates are popular Rust storage backends that are fully supported via zarrs_opendal and zarrs_object_store. These backends provide more feature complete HTTP stores than zarrs_http.

zarrs_icechunk implements the Icechunk transactional storage engine, a storage specification for Zarr that supports object_store stores.

The AsyncToSyncStorageAdapter enables some async stores to be used in a sync context.

§Examples

§Create and Read a Zarr Hierarchy

use zarrs::group::GroupBuilder;
use zarrs::array::{ArrayBuilder, DataType, FillValue, ZARR_NAN_F32};
use zarrs::array::codec::GzipCodec; // requires gzip feature
use zarrs::array_subset::ArraySubset;
use zarrs::storage::ReadableWritableListableStorage;
use zarrs::filesystem::FilesystemStore; // requires filesystem feature

// Create a filesystem store
let store_path: PathBuf = "/path/to/hierarchy.zarr".into();
let store: ReadableWritableListableStorage =
    Arc::new(FilesystemStore::new(&store_path)?);

// Write the root group metadata
GroupBuilder::new()
    .build(store.clone(), "/")?
    // .attributes(...)
    .store_metadata()?;

// Create a new V3 array using the array builder
let array = ArrayBuilder::new(
    vec![3, 4], // array shape
    DataType::Float32,
    vec![2, 2].try_into()?, // regular chunk shape (non-zero elements)
    FillValue::from(ZARR_NAN_F32),
)
.bytes_to_bytes_codecs(vec![
    Arc::new(GzipCodec::new(5)?),
])
.dimension_names(["y", "x"].into())
.attributes(serde_json::json!({"Zarr V3": "is great"}).as_object().unwrap().clone())
.build(store.clone(), "/array")?; // /path/to/hierarchy.zarr/array

// Store the array metadata
array.store_metadata()?;
println!("{}", serde_json::to_string_pretty(array.metadata())?);
// {
//     "zarr_format": 3,
//     "node_type": "array",
//     ...
// }

// Perform some operations on the chunks
array.store_chunk_elements::<f32>(
    &[0, 1], // chunk index
    &[0.2, 0.3, 1.2, 1.3]
)?;
array.store_array_subset_ndarray::<f32, _>(
    &[1, 1], // array index (start of subset)
    ndarray::array![[-1.1, -1.2], [-2.1, -2.2]]
)?;
array.erase_chunk(&[1, 1])?;

// Retrieve all array elements as an ndarray
let array_ndarray = array.retrieve_array_subset_ndarray::<f32>(&array.subset_all())?;
println!("{array_ndarray:4}");
// [[ NaN,  NaN,  0.2,  0.3],
//  [ NaN, -1.1, -1.2,  1.3],
//  [ NaN, -2.1,  NaN,  NaN]]

§More examples

Various examples can be found in the examples directory that demonstrate:

  • creating and manipulating zarr hierarchies with various stores (sync and async), codecs, etc,
  • converting between Zarr V2 and V3, and
  • creating custom data types.

Examples can be run with cargo run --example <EXAMPLE_NAME>.

  • Some examples require non-default features, which can be enabled with --all-features or --features <FEATURES>.
  • Some examples support a -- --usage-log argument to print storage API calls during execution.

§Crate Features

§Default
  • filesystem: Re-export zarrs_filesystem as zarrs::filesystem
  • ndarray: ndarray utility functions for Array.
  • Codecs: blosc, crc32c, gzip, sharding, transpose, zstd.
§Non-Default
  • async: an experimental asynchronous API for stores, Array, and Group.
    • The async API is runtime-agnostic. This has some limitations that are detailed in the Array docs.
    • The async API is not as performant as the sync API.
  • dlpack: adds convenience methods for DLPack tensor interop to Array
  • Codecs: bitround, bz2, fletcher32, gdeflate, pcodec, zfp, zlib.

§zarrs Ecosystem

The Zarr specification is inherently unstable. It is under active development and new extensions are continually being introduced.

The zarrs crate has been split into multiple crates to:

  • allow external implementations of stores and extensions points to target a relatively stable API compatible with a range of zarrs versions,
  • enable automatic backporting of metadata compatibility fixes and changes due to standardisation,
  • stay up-to-date with unstable public dependencies (e.g. opendal, object_store, icechunk, etc) without impacting the release cycle of zarrs, and
  • improve compilation times.

A hierarchical overview of these crates can be found in the The zarrs Book.

§Core
  • zarrs: The core library for manipulating Zarr hierarchies.
  • zarrs_metadata: Zarr metadata support (re-exported as zarrs::metadata).
  • zarrs_metadata_ext: Zarr extensions metadata support (re-exported as zarrs::metadata_ext).
  • zarrs_data_type: The data type extension API for zarrs (re-exported in zarrs::array::data_type).
  • zarrs_storage: The storage API for zarrs (re-exported as zarrs::storage).
  • zarrs_plugin: The plugin API for zarrs (re-exported as zarrs::plugin).
  • zarrs_registry: The Zarr extension point registry for zarrs (re-exported as zarrs::registry).
§Stores
§Bindings
§Zarr Metadata Conventions
§Tools
  • zarrs_tools: Various tools for creating and manipulating Zarr V3 data with the zarrs rust crate
    • A reencoder that can change codecs, chunk shape, convert Zarr V2 to V3, etc.
    • Create an OME-Zarr hierarchy from a Zarr array.
    • Transform arrays: crop, rescale, downsample, gradient magnitude, gaussian, noise filtering, etc.

§Benchmarks

  • zarr_benchmarks: Benchmarks of various Zarr V3 implementations: zarrs, zarr-python, tensorstore

§Licence

zarrs is licensed under either of

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Re-exports§

pub use zarrs_metadata as metadata;
pub use zarrs_metadata_ext as metadata_ext;
pub use zarrs_plugin as plugin;
pub use zarrs_registry as registry;
pub use zarrs_storage as storage;
pub use zarrs_filesystem as filesystem;filesystem

Modules§

array
Zarr arrays.
array_subset
Array subsets.
byte_range
Byte ranges.
config
zarrs global configuration options.
group
Zarr groups.
node
Zarr nodes.
version
zarrs version information.