Crate icechunk

Source
Expand description

General design:

  • Most things are async even if they don’t need to be. Async propagates unfortunately. If something can be async sometimes it needs to be async always. In our example: fetching from storage.
  • There is a high level interface that knows about arrays, groups, user attributes, etc. This is the repository::Repository type.
  • There is a low level interface that speaks zarr keys and values, and is used to provide the zarr store that will be used from python. This is the [zarr::Store] type.
  • There is a translation language between low and high levels. When user writes to a zarr key, we need to convert that key to the language of arrays and groups. This is implemented it the [zarr] module
  • There is an abstract type for loading and saving of the Arrow datastructures. This is the Storage trait. It knows how to fetch and write arrow. We have:
    • an in memory implementation
    • an s3 implementation that writes to parquet
    • a caching wrapper implementation
  • The datastructures are represented by concrete types in the format modules. These datastructures use Arrow RecordBatches for representation.

Re-exports§

pub use config::ObjectStoreConfig;
pub use config::RepositoryConfig;
pub use repository::Repository;
pub use storage::ObjectStorage;
pub use storage::Storage;
pub use storage::StorageError;
pub use storage::new_in_memory_storage;
pub use storage::new_local_filesystem_storage;
pub use storage::new_s3_storage;
pub use store::Store;

Modules§

asset_manager
change_set
cli
config
conflicts
error
format
ops
refs
repository
session
storage
store
virtual_chunks