pub struct StorageBackend<D>where
D: DB,{ /* private fields */ }Expand description
A storage back-end, that wraps interactions with the database and provides
in-memory caching. Its public API provides a mapping from ArenaHash keys
to OnDiskObject objects, and a way to persist such objects to the DB,
taking care of reference counting along the way.
§Pub access to StorageBackend objects
There is no pub API to construct StorageBackend. Rather, lib users are
expected to construct a crate::Storage, and then access the
StorageBackend via crate::arena::Arena::with_backend method of the
crate::Storage::arena field.
§Overview
This module intermediates between the in-memory arena and the persistent database, managing reference counts for the Merkle-ized DAGs we store in the DB, and providing a caching layer. The cache addresses several concerns:
-
reducing disk reads: by keeping data read from the on-disk DB in memory, we can avoid going to disk again the next time that data is accessed.
-
reducing disk writes: the arena creates a lot of temporary data structures that will never be persisted – i.e. be marked as a gc-root or be in the transitive closure of a GC root – and so instead of eagerly writing these data to disk, we keep them in memory in case they get dropped right away. Also, we track reference counts in the DB, and so by doing reference count updates only in cache where possible, we avoid having to write intermediate states to disk.
-
bulking disk writes: transactions in SQLite (used by [
crate::db::SqlDB]) are very expensive, and so we want to do many writes at once when possible. By collecting the potential writes in memory in the cache, we can then flush them periodically en masse.
However, this caching layer also adds complexity, because we now have
multiple potential sources of truth, the DB and the cache. The main
complexity here arises from concern (2), reducing disk writes. We optimize
the common case, where the arena creates temporary data structures and
StorageBackend::caches them into the cache, only to drop them soon
after. The tricky thing here is deducing that the net change of a bunch of
arena cache mutations is the identity, i.e. that we don’t need to write
anything to the DB. In the case of new object creation – meaning objects
that aren’t already in the DB – this is easy: when the arena announces it’s
done with a particular key – by calling StorageBackend::uncache on that
key – we simply check if that key has any remaining references to it, and
drop it if not. The harder case is when the arena changes the reference
counts for keys that already exist in the DB, by creating larger structures
that reference these existing keys. To handle this case, we track reference
count deltas instead of cardinal reference counts: if the net effect of all
the deltas is zero, then we know the key can safely be dropped from the
cache.
Another source of complexity arises from a concern that has nothing to do
with the goals of caching, and instead conflicts with caching: we want to
support the creation of data structures that are too large to fit in memory,
without the user needing to carefully checkpoint the construction. To deal
with this, we track the size of the write cache, and provide
StorageBackend::flush_cache_evictions_to_db to bulk-write mutations to disk
when the write cache has become too large. In these cases we can’t know at the time
of disk-writing if the written mutations will persist. So, we may end up
with unused temporary values in the DB, and a separate GC operation is
responsible for cleaning these up periodically.
§Assumptions
-
the database is not changing under our feet, meaning in particular that there is only one back-end. When an object
objiscached into the back-end, what happens depends on whetherobjis already in the database. To support the database changing under our feet, we would need to handle the case whereobjwas in the database when it wascached, but was then removed from or modified in the database by another thread beforeobjwasuncached. -
there is only one “logical” arena calling into / manipulating the back-end. Here “one logical arena” includes multiple clones of a single initial arena, since cloned arenas share their metadata structures and avoid
cacheing the same object more than once. Indeed, the key assumption we make here is that no object will becacheed more than once, independently. It would be possibly to support multiple, distinct arenas – non-clones, with distinct metadata – manipulating the back-end, but would require more careful tracking ofcachecalls, to handle the case where two different arenascachethe same object, and we need to be careful to keep it around until all these arena’s are done using it. -
objects are
cached only after all of their children have beencached or are already in the db. In particular, this means the arena is responsible for sanitizing user controlled inputs before passing them to the back-end, to make sure they are well formed, in terms of children references. -
if the caller
caches an objectobj, and wantsobjto continue to exist in the back-end, then beforeuncacheingobj, the user needs to first do either of:-
cachebut notuncacheanother object which hasobjin its transitive closure. -
persistan object that hasobjin its transitive closure.
Note that it’s okay for the caller to e.g.
uncacheinterior nodes of a large data structure, as long as a reference to the root has beenpersisted orcached-but-not-uncached. -
§Terminology and APIs
A key will never be in read_cache and write_cache at the same time, but
may be in database and either cache at the same time. If a key is in
either cache, then we say it’s “in memory”. If a key is in memory, then the
value stored in memory under that key describes the canonical version of the
object.
The back-end provides various public APIs related to manipulating in-memory
representations of objects. The get API brings an object into memory from
the database. The cache API attempts to create a new object in memory,
but falls back on any existing version already in memory or the DB. The
uncache API informs the back-end that an object is no longer of interest to
the caller, which allows the back-end to remove it from memory if it has no
references or pending updates in memory. These APIs act on ArenaHash keys
and OnDiskObject values, but internally manipulate more complex states, in
the form of CacheValue values.
Implementations§
Source§impl<D> StorageBackend<D>where
D: DB,
impl<D> StorageBackend<D>where
D: DB,
Sourcepub fn get(
&mut self,
key: &ArenaHash<<D as DB>::Hasher>,
) -> Option<&OnDiskObject<<D as DB>::Hasher>>
pub fn get( &mut self, key: &ArenaHash<<D as DB>::Hasher>, ) -> Option<&OnDiskObject<<D as DB>::Hasher>>
Get object with key, trying to get object from memory and falling back to database if necessary.
If an object is found, then it ends up in the front of the cache.
Callers may want to call Self::pre_fetch on the root of the DAG
once, before calling this get function many times, if they expect to
call get for many nodes from the same DAG.
Calling get on a temporary object that has already been uncached,
but is still in memory because it’s still referenced, will not cause
that temp object to continue to exist if its ref count goes to zero. If
you want that, then call cache instead!
Sourcepub fn get_stats(&self) -> StorageBackendStats
pub fn get_stats(&self) -> StorageBackendStats
Get a copy of the current stats.
Public API users are expected to access this via
Arena::with_backend(|b| b.get_stats()).
Sourcepub fn get_roots(&self) -> HashMap<ArenaHash<<D as DB>::Hasher>, u32>
pub fn get_roots(&self) -> HashMap<ArenaHash<<D as DB>::Hasher>, u32>
Get collection of all GC roots and their root counts. Returned root counts are positive.
As implemented, this function assumes no concurrent modification of the roots in the DB while the root collection is being computed.
Sourcepub fn unpersist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)
pub fn unpersist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)
Un-mark key as GC root. See Self::persist.
Sourcepub fn persist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)
pub fn persist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)
Mark key as a GC root, meaning it will be persisted across GC runs,
even if no other data references it. Use unpersist to un-mark.
NOTE: The same key can be persisted multiple times, in which case it
needs to be unpersisted the same number of times to stop being treated as
a GC root. In other words, the GC-root status of a key is a non-negative
int, not a bool.
Sourcepub fn pre_fetch(
&mut self,
key: &ArenaHash<<D as DB>::Hasher>,
max_depth: Option<usize>,
truncate: bool,
)
pub fn pre_fetch( &mut self, key: &ArenaHash<<D as DB>::Hasher>, max_depth: Option<usize>, truncate: bool, )
Load DAG rooted at key into the cache from the DB.
Nodes up to and including 0-indexed depth depth, i.e. where key is
for node at depth 0.
The truncate argument, if true, means the pre-fetch BFS will be
truncated at nodes that are already in memory. Note that if using
truncate == false, and calling pre_fetch many times, you can end
up doing O(n^2) work when getting n nodes, if the nodes are pre-fetched
from the leaves up to the root (simplest case: a linear chain of n
nodes).
Sourcepub fn flush_cache_evictions_to_db(&mut self)
pub fn flush_cache_evictions_to_db(&mut self)
Push pending writes to shrink the write cache to self.cache_size.
§Note
This doesn’t flush all pending writes in self.write_cache –
that’s what the flush_all_changes_to_db does – the point of this
function is to help avoid the write cache becoming too large. But what
we may want is an intermediate version, that flushes all referenced
pending writes in the write cache.
Sourcepub fn flush_all_changes_to_db(&mut self)
pub fn flush_all_changes_to_db(&mut self)
Push all pending writes to the DB.
Sourcepub fn gc(&mut self)
pub fn gc(&mut self)
Remove all unreachable nodes from memory and the DB.
Here “unreachable” nodes are nodes with ref_count == 0 and root_count == 0, which are not currently “live inserts”, meaning they have been
cached but not uncached.
§Note
This GC implementation assumes the correctness of the reference counts stored in the db and memory, and doesn’t actually do a reachability search from the roots. This is much faster than searching the entire db from the roots, but means this function is not sufficient to clean up the db after a crash which left the db in an inconsistent state, in terms of db-stored reference counts.
Sourcepub fn get_write_cache_len(&self) -> usize
pub fn get_write_cache_len(&self) -> usize
Return the number of elements in the write cache.
§Note
The elements in the cache are potentially of wildly varying sizes; use
get_write_cache_obj_bytes if you’re worried about the actual memory
usage.
Sourcepub fn get_write_cache_obj_bytes(&self) -> usize
pub fn get_write_cache_obj_bytes(&self) -> usize
Return the number of bytes in the OnDiskObjects in the write cache.
§Note
This ignores the write-cache data other than OnDiskObjects, which
includes the ref deltas. But that data should be relatively small.
Trait Implementations§
Auto Trait Implementations§
impl<D> !Freeze for StorageBackend<D>
impl<D> !RefUnwindSafe for StorageBackend<D>
impl<D> Send for StorageBackend<D>
impl<D> !Sync for StorageBackend<D>
impl<D> Unpin for StorageBackend<D>where
D: Unpin,
<<<D as DB>::Hasher as OutputSizeUser>::OutputSize as ArrayLength<u8>>::ArrayType: Unpin,
impl<D> UnsafeUnpin for StorageBackend<D>where
D: UnsafeUnpin,
impl<D> UnwindSafe for StorageBackend<D>where
D: UnwindSafe,
<<<D as DB>::Hasher as OutputSizeUser>::OutputSize as ArrayLength<u8>>::ArrayType: UnwindSafe + RefUnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> FmtForward for T
impl<T> FmtForward for T
Source§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self to use its Binary implementation when Debug-formatted.Source§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self to use its Display implementation when
Debug-formatted.Source§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self to use its LowerExp implementation when
Debug-formatted.Source§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self to use its LowerHex implementation when
Debug-formatted.Source§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self to use its Octal implementation when Debug-formatted.Source§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self to use its Pointer implementation when
Debug-formatted.Source§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self to use its UpperExp implementation when
Debug-formatted.Source§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self to use its UpperHex implementation when
Debug-formatted.Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
Source§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
Source§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
Source§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R,
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
Source§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self, then passes self.as_ref() into the pipe function.Source§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self, then passes self.as_mut() into the pipe
function.Source§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self, then passes self.deref() into the pipe function.Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<T> Tap for T
impl<T> Tap for T
Source§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B> of a value. Read moreSource§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B> of a value. Read moreSource§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R> view of a value. Read moreSource§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R> view of a value. Read moreSource§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap() only in debug builds, and is erased in release builds.Source§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut() only in debug builds, and is erased in release
builds.Source§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref() only in debug builds, and is erased in release
builds.Source§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut() only in debug builds, and is erased in release
builds.Source§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref() only in debug builds, and is erased in release
builds.