pub struct Atlas { /* private fields */ }Expand description
Handle to an opened or newly created atlas store.
Owns the object_store backend, the in-memory store metadata, a
per-array file cache, and the chosen array / metadata codecs. All
mutations (create_dataset, delete_dataset, and everything that
flows through a DatasetView) update in-memory state only —
nothing reaches disk until Atlas::flush.
Atlas is Send + Sync and safe to share across tasks; each array
file is independently guarded by a tokio::sync::RwLock.
Implementations§
Source§impl Atlas
impl Atlas
Sourcepub async fn open(store: Arc<dyn ObjectStore>, prefix: Path) -> Result<Self>
pub async fn open(store: Arc<dyn ObjectStore>, prefix: Path) -> Result<Self>
Open an existing store at prefix within store.
Reads atlas.json exactly once. Subsequent mutations only touch the
in-memory meta until Atlas::flush is called.
Sourcepub async fn create(
store: Arc<dyn ObjectStore>,
prefix: Path,
config: StoreConfig,
) -> Result<Self>
pub async fn create( store: Arc<dyn ObjectStore>, prefix: Path, config: StoreConfig, ) -> Result<Self>
Create a new store at prefix within store.
Sourcepub async fn open_path(path: impl AsRef<Path>) -> Result<Self>
pub async fn open_path(path: impl AsRef<Path>) -> Result<Self>
Open an existing store at the given local filesystem path.
The metadata format (atlas.json / atlas.msgpack / …zst / …lz4)
and array codec are auto-detected from the on-disk files — no
StoreConfig needed on reopen.
§Examples
use atlas::{Atlas, StoreConfig};
let tmp = tempfile::tempdir().unwrap();
// Create + flush a store so there's something to open.
{
let mut s = Atlas::create_path(tmp.path(), StoreConfig::default()).await.unwrap();
s.create_dataset("ds1").await.unwrap();
s.flush().await.unwrap();
}
let s = Atlas::open_path(tmp.path()).await.unwrap();
assert!(s.dataset_exists("ds1"));Sourcepub async fn create_path(
path: impl AsRef<Path>,
config: StoreConfig,
) -> Result<Self>
pub async fn create_path( path: impl AsRef<Path>, config: StoreConfig, ) -> Result<Self>
Create a new store at the given local filesystem path. The directory is created
(recursively, like mkdir -p) if it does not already exist.
§Examples
use atlas::{Atlas, StoreConfig};
let tmp = tempfile::tempdir().unwrap();
let s = Atlas::create_path(tmp.path(), StoreConfig::default()).await.unwrap();
assert!(s.list_datasets().is_empty());Sourcepub async fn create_dataset(&mut self, name: &str) -> Result<DatasetView>
pub async fn create_dataset(&mut self, name: &str) -> Result<DatasetView>
Create a new dataset in this store and return a DatasetView
for populating it. Errors with Error::DatasetAlreadyExists if
a dataset with this name is already registered, or
Error::InvalidName if name violates the naming rules
(non-empty, no /, no leading _, not . or ..).
Sourcepub async fn open_dataset(&self, name: &str) -> Result<DatasetView>
pub async fn open_dataset(&self, name: &str) -> Result<DatasetView>
Return a DatasetView for an existing dataset. Errors with
Error::DatasetNotFound if no dataset with this name exists.
Cheap — reads the in-memory metadata, never touches disk.
Sourcepub async fn delete_dataset(&mut self, name: &str) -> Result<()>
pub async fn delete_dataset(&mut self, name: &str) -> Result<()>
Remove a dataset from this store. Tombstones the dataset’s entries
inside every shared array file but does not flush — call
Atlas::flush to persist the deletion, and optionally
Atlas::compact afterwards to reclaim the storage.
Errors with Error::DatasetNotFound if no dataset with this
name exists.
Sourcepub fn list_datasets(&self) -> Vec<String>
pub fn list_datasets(&self) -> Vec<String>
All dataset names currently registered in this store, in insertion order. Reads from the in-memory store metadata — no disk I/O.
Sourcepub fn dataset_exists(&self, name: &str) -> bool
pub fn dataset_exists(&self, name: &str) -> bool
true if a dataset with this name is registered. O(1) hash lookup in
the in-memory store metadata.
Sourcepub fn list_arrays(&self) -> Vec<String>
pub fn list_arrays(&self) -> Vec<String>
Distinct array names across all datasets in this store, sorted.
One entry per physical .af file — datasets sharing an array name
(the common case) collapse to a single entry here.
Sourcepub fn array_dtype(&self, array: &str) -> Option<DType>
pub fn array_dtype(&self, array: &str) -> Option<DType>
Returns the dtype of array if any dataset in this store declares it.
Used by read_array_across’s Python binding to pick the generic
instantiation without round-tripping through a DatasetView.
Sourcepub async fn read_array_across<T: ArrayElement + Send + Sync + 'static>(
&self,
array: &str,
dataset_names: &[String],
start: Vec<usize>,
shape: Vec<usize>,
) -> Result<Vec<Option<ArcArray<T, IxDyn>>>>
pub async fn read_array_across<T: ArrayElement + Send + Sync + 'static>( &self, array: &str, dataset_names: &[String], start: Vec<usize>, shape: Vec<usize>, ) -> Result<Vec<Option<ArcArray<T, IxDyn>>>>
Bulk read the same slice of array from many datasets that share its
physical file. Runs at most num_cpus reads concurrently — matching
what a well-tuned dask threadpool would do — to keep
tokio::task::spawn_blocking’s decompression pool from oversubscribing
the actual CPU cores.
This exists because open_as_many_xarray_dataset over N datasets used to incur N
separate Python → Rust → tokio::block_on transitions plus Python-side
dask graph overhead. One call here replaces all of that and gets the
same parallelism dask was providing — but in pure Rust, with no GIL
involvement until the results return.
start and shape follow the same conventions as
DatasetView::read_array: empty start + empty shape mean the
full array. Per-dataset entries that don’t declare array are
returned as None.
Sourcepub async fn read_array_across_stacked<T: ArrayElement + Send + Sync + Clone + 'static>(
&self,
array: &str,
dataset_names: &[String],
start: Vec<usize>,
shape: Vec<usize>,
) -> Result<Array<T, IxDyn>>
pub async fn read_array_across_stacked<T: ArrayElement + Send + Sync + Clone + 'static>( &self, array: &str, dataset_names: &[String], start: Vec<usize>, shape: Vec<usize>, ) -> Result<Array<T, IxDyn>>
Like Atlas::read_array_across but returns one stacked
(len(dataset_names), *per_dataset_shape) ndarray::Array instead of
a Vec of per-dataset arrays.
The output buffer is pre-allocated once; each parallel read writes its
row in as the task completes, overlapping the serial copy with the
remaining in-flight reads. Saves the ~5.7 GiB of memory copies that
the Python-side np.stack on the per-dataset list would do on a
1000-dataset gridded workload.
Errors if any listed dataset doesn’t declare array — the stacked
representation has no positional “missing” sentinel.
Sourcepub async fn flush(&mut self) -> Result<()>
pub async fn flush(&mut self) -> Result<()>
Flush every known array file’s pending writes AND persist the in-memory
atlas.json. This is the single durability boundary for the store.
Force-initializes every array referenced in meta, even ones never
touched by a DatasetView (lazy-init wins are on the read path, not
on flush).
Auto Trait Implementations§
impl !RefUnwindSafe for Atlas
impl !UnwindSafe for Atlas
impl Freeze for Atlas
impl Send for Atlas
impl Sync for Atlas
impl Unpin for Atlas
impl UnsafeUnpin for Atlas
Blanket Implementations§
Source§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
Source§type ArchivedMetadata = ()
type ArchivedMetadata = ()
Source§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
Source§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
Source§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
Source§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
Source§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out indicating that a T is niched.