Expand description
ATLAS (Aggregated Tensor Large Array Store) is a directory-based store for thousands of named datasets.
Each dataset is a virtual collection of named N-dimensional arrays with per-dataset and
per-array attributes, backed by the array-format crate. Datasets sharing an array name
are co-located in the same physical file, keyed by dataset name.
§Layout
my_store/
├── atlas.json <- dataset registry + per-dataset attributes
├── temperature/
│ └── data.af <- ArrayFile: one named array per dataset
└── latitude/
└── data.af§Quick start
use atlas::{Atlas, Attr, StoreConfig};
use ndarray::Array2;
let tmp = tempfile::tempdir().unwrap();
// Create — codec persists in atlas.json so `open_path` doesn't need it.
let mut s = Atlas::create_path(tmp.path(), StoreConfig::default()).await.unwrap();
{
let mut ds = s.create_dataset("jan_2024").await.unwrap();
ds.define_array::<f32>(
"temperature",
vec!["lat".into(), "lon".into()],
vec![4, 8],
None, // chunk_shape — defaults to full shape (one chunk)
None, // fill_value
).await.unwrap();
let data = Array2::<f32>::from_elem([4, 8], 20.0).into_dyn();
ds.write_array("temperature", vec![0, 0], data.view()).await.unwrap();
ds.set_attribute("month", Attr::Int64(1));
}
s.flush().await.unwrap(); // single durability boundary
// Reopen — no config needed.
let s2 = Atlas::open_path(tmp.path()).await.unwrap();
let ds2 = s2.open_dataset("jan_2024").await.unwrap();
let temp = ds2.read_array::<f32>("temperature", vec![], vec![]).await.unwrap().unwrap();
assert_eq!(temp.shape(), &[4, 8]);
assert_eq!(temp[[0, 0]], 20.0);§Thread safety
Atlas and DatasetView are Send + Sync. Each physical array file
is guarded by a tokio::sync::RwLock: concurrent reads (read_array,
array_stats) proceed in parallel without contention, while writes
(write_array, define_array, flush, compact, …) take an exclusive
lock. The cache map uses a parking_lot::RwLock that is never held across
an await point.
§Durability
atlas.json is loaded once when the store is opened or created; from
then on every mutation (create_dataset, define_array, set_attribute,
…) only touches the in-memory StoreMeta. The store does not persist
until Atlas::flush is called. Dropping an Atlas
without flushing abandons every pending in-memory write.
Structs§
- Array
Schema - Schema for a single named array within a dataset.
- Array
Stats - Aggregate statistics for a single array covering all its chunks.
- Atlas
- Handle to an opened or newly created atlas store.
- Dataset
Meta - Metadata for a single dataset: array schemas and per-dataset attributes.
Both maps preserve insertion order (via
IndexMap) so on-disk layouts and Python-side dict iteration mirror the order arrays/attributes were added. - Dataset
View - A borrowed handle to one dataset within an
Atlas. - Delta
Cache - Two-level cache shared across all delta layers in an
ArrayFile. - Merged
Array Meta - Array metadata visible to the caller after merging all delta layers.
- Store
Config - Configuration for opening or creating an
Atlas. - Timestamp
Ns - Nanoseconds since the Unix epoch (1970-01-01 00:00:00 UTC).
Enums§
- Attr
- A per-dataset attribute value stored in
atlas.json. - Codec
- Compression codec applied when writing new array blocks.
- DType
- Describes the element type of an array.
- Error
- Every error returned by this crate. Each variant carries enough context
to identify what failed;
Display(viathiserror) renders the same message shown in the///line above each variant. - Fill
Value - A scalar fill value for an array.
- Meta
Format - On-disk encoding for the store’s metadata file.
- Stat
Value - A typed min or max value.
Traits§
- Array
Element - Unified element type for all array operations.
Type Aliases§
- Result
- Convenience alias for
Result<T, atlas::Error>returned by every fallible operation in the crate.