Skip to main content

StorageBackend

Struct StorageBackend 

Source
pub struct StorageBackend<D>
where D: DB,
{ /* private fields */ }
Expand description

A storage back-end, that wraps interactions with the database and provides in-memory caching. Its public API provides a mapping from ArenaHash keys to OnDiskObject objects, and a way to persist such objects to the DB, taking care of reference counting along the way.

§Pub access to StorageBackend objects

There is no pub API to construct StorageBackend. Rather, lib users are expected to construct a crate::Storage, and then access the StorageBackend via crate::arena::Arena::with_backend method of the crate::Storage::arena field.

§Overview

This module intermediates between the in-memory arena and the persistent database, managing reference counts for the Merkle-ized DAGs we store in the DB, and providing a caching layer. The cache addresses several concerns:

  1. reducing disk reads: by keeping data read from the on-disk DB in memory, we can avoid going to disk again the next time that data is accessed.

  2. reducing disk writes: the arena creates a lot of temporary data structures that will never be persisted – i.e. be marked as a gc-root or be in the transitive closure of a GC root – and so instead of eagerly writing these data to disk, we keep them in memory in case they get dropped right away. Also, we track reference counts in the DB, and so by doing reference count updates only in cache where possible, we avoid having to write intermediate states to disk.

  3. bulking disk writes: transactions in SQLite (used by [crate::db::SqlDB]) are very expensive, and so we want to do many writes at once when possible. By collecting the potential writes in memory in the cache, we can then flush them periodically en masse.

However, this caching layer also adds complexity, because we now have multiple potential sources of truth, the DB and the cache. The main complexity here arises from concern (2), reducing disk writes. We optimize the common case, where the arena creates temporary data structures and StorageBackend::caches them into the cache, only to drop them soon after. The tricky thing here is deducing that the net change of a bunch of arena cache mutations is the identity, i.e. that we don’t need to write anything to the DB. In the case of new object creation – meaning objects that aren’t already in the DB – this is easy: when the arena announces it’s done with a particular key – by calling StorageBackend::uncache on that key – we simply check if that key has any remaining references to it, and drop it if not. The harder case is when the arena changes the reference counts for keys that already exist in the DB, by creating larger structures that reference these existing keys. To handle this case, we track reference count deltas instead of cardinal reference counts: if the net effect of all the deltas is zero, then we know the key can safely be dropped from the cache.

Another source of complexity arises from a concern that has nothing to do with the goals of caching, and instead conflicts with caching: we want to support the creation of data structures that are too large to fit in memory, without the user needing to carefully checkpoint the construction. To deal with this, we track the size of the write cache, and provide StorageBackend::flush_cache_evictions_to_db to bulk-write mutations to disk when the write cache has become too large. In these cases we can’t know at the time of disk-writing if the written mutations will persist. So, we may end up with unused temporary values in the DB, and a separate GC operation is responsible for cleaning these up periodically.

§Assumptions

  • the database is not changing under our feet, meaning in particular that there is only one back-end. When an object obj is cached into the back-end, what happens depends on whether obj is already in the database. To support the database changing under our feet, we would need to handle the case where obj was in the database when it was cached, but was then removed from or modified in the database by another thread before obj was uncached.

  • there is only one “logical” arena calling into / manipulating the back-end. Here “one logical arena” includes multiple clones of a single initial arena, since cloned arenas share their metadata structures and avoid cacheing the same object more than once. Indeed, the key assumption we make here is that no object will be cacheed more than once, independently. It would be possibly to support multiple, distinct arenas – non-clones, with distinct metadata – manipulating the back-end, but would require more careful tracking of cache calls, to handle the case where two different arenas cache the same object, and we need to be careful to keep it around until all these arena’s are done using it.

  • objects are cached only after all of their children have been cached or are already in the db. In particular, this means the arena is responsible for sanitizing user controlled inputs before passing them to the back-end, to make sure they are well formed, in terms of children references.

  • if the caller caches an object obj, and wants obj to continue to exist in the back-end, then before uncacheing obj, the user needs to first do either of:

    • cache but not uncache another object which has obj in its transitive closure.

    • persist an object that has obj in its transitive closure.

    Note that it’s okay for the caller to e.g. uncache interior nodes of a large data structure, as long as a reference to the root has been persisted or cached-but-not-uncached.

§Terminology and APIs

A key will never be in read_cache and write_cache at the same time, but may be in database and either cache at the same time. If a key is in either cache, then we say it’s “in memory”. If a key is in memory, then the value stored in memory under that key describes the canonical version of the object.

The back-end provides various public APIs related to manipulating in-memory representations of objects. The get API brings an object into memory from the database. The cache API attempts to create a new object in memory, but falls back on any existing version already in memory or the DB. The uncache API informs the back-end that an object is no longer of interest to the caller, which allows the back-end to remove it from memory if it has no references or pending updates in memory. These APIs act on ArenaHash keys and OnDiskObject values, but internally manipulate more complex states, in the form of CacheValue values.

Implementations§

Source§

impl<D> StorageBackend<D>
where D: DB,

Source

pub fn get( &mut self, key: &ArenaHash<<D as DB>::Hasher>, ) -> Option<&OnDiskObject<<D as DB>::Hasher>>

Get object with key, trying to get object from memory and falling back to database if necessary.

If an object is found, then it ends up in the front of the cache.

Callers may want to call Self::pre_fetch on the root of the DAG once, before calling this get function many times, if they expect to call get for many nodes from the same DAG.

Calling get on a temporary object that has already been uncached, but is still in memory because it’s still referenced, will not cause that temp object to continue to exist if its ref count goes to zero. If you want that, then call cache instead!

Source

pub fn get_stats(&self) -> StorageBackendStats

Get a copy of the current stats.

Public API users are expected to access this via Arena::with_backend(|b| b.get_stats()).

Source

pub fn get_roots(&self) -> HashMap<ArenaHash<<D as DB>::Hasher>, u32>

Get collection of all GC roots and their root counts. Returned root counts are positive.

As implemented, this function assumes no concurrent modification of the roots in the DB while the root collection is being computed.

Source

pub fn unpersist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)

Un-mark key as GC root. See Self::persist.

Source

pub fn persist(&mut self, key: &ArenaHash<<D as DB>::Hasher>)

Mark key as a GC root, meaning it will be persisted across GC runs, even if no other data references it. Use unpersist to un-mark.

NOTE: The same key can be persisted multiple times, in which case it needs to be unpersisted the same number of times to stop being treated as a GC root. In other words, the GC-root status of a key is a non-negative int, not a bool.

Source

pub fn pre_fetch( &mut self, key: &ArenaHash<<D as DB>::Hasher>, max_depth: Option<usize>, truncate: bool, )

Load DAG rooted at key into the cache from the DB.

Nodes up to and including 0-indexed depth depth, i.e. where key is for node at depth 0.

The truncate argument, if true, means the pre-fetch BFS will be truncated at nodes that are already in memory. Note that if using truncate == false, and calling pre_fetch many times, you can end up doing O(n^2) work when getting n nodes, if the nodes are pre-fetched from the leaves up to the root (simplest case: a linear chain of n nodes).

Source

pub fn flush_cache_evictions_to_db(&mut self)

Push pending writes to shrink the write cache to self.cache_size.

§Note

This doesn’t flush all pending writes in self.write_cache – that’s what the flush_all_changes_to_db does – the point of this function is to help avoid the write cache becoming too large. But what we may want is an intermediate version, that flushes all referenced pending writes in the write cache.

Source

pub fn flush_all_changes_to_db(&mut self)

Push all pending writes to the DB.

Source

pub fn gc(&mut self)

Remove all unreachable nodes from memory and the DB.

Here “unreachable” nodes are nodes with ref_count == 0 and root_count == 0, which are not currently “live inserts”, meaning they have been cached but not uncached.

§Note

This GC implementation assumes the correctness of the reference counts stored in the db and memory, and doesn’t actually do a reachability search from the roots. This is much faster than searching the entire db from the roots, but means this function is not sufficient to clean up the db after a crash which left the db in an inconsistent state, in terms of db-stored reference counts.

Source

pub fn get_write_cache_len(&self) -> usize

Return the number of elements in the write cache.

§Note

The elements in the cache are potentially of wildly varying sizes; use get_write_cache_obj_bytes if you’re worried about the actual memory usage.

Source

pub fn get_write_cache_obj_bytes(&self) -> usize

Return the number of bytes in the OnDiskObjects in the write cache.

§Note

This ignores the write-cache data other than OnDiskObjects, which includes the ref deltas. But that data should be relatively small.

Trait Implementations§

Source§

impl<D> Debug for StorageBackend<D>
where D: Debug + DB, <D as DB>::Hasher: Debug,

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> Conv for T

Source§

fn conv<T>(self) -> T
where Self: Into<T>,

Converts self into T using Into<T>. Read more
Source§

impl<T> Fake for T

Source§

fn fake<U>(&self) -> U
where Self: FakeBase<U>,

Source§

fn fake_with_rng<U, R>(&self, rng: &mut R) -> U
where R: Rng + ?Sized, Self: FakeBase<U>,

Source§

impl<T> FmtForward for T

Source§

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

Causes self to use its Binary implementation when Debug-formatted.
Source§

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

Causes self to use its Display implementation when Debug-formatted.
Source§

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

Causes self to use its LowerExp implementation when Debug-formatted.
Source§

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

Causes self to use its LowerHex implementation when Debug-formatted.
Source§

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

Causes self to use its Octal implementation when Debug-formatted.
Source§

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

Causes self to use its Pointer implementation when Debug-formatted.
Source§

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

Causes self to use its UpperExp implementation when Debug-formatted.
Source§

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

Causes self to use its UpperHex implementation when Debug-formatted.
Source§

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

Formats each item in a sequence. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Pipe for T
where T: ?Sized,

Source§

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

Pipes by value. This is generally the method you want to use. Read more
Source§

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

Borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

Mutably borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

Borrows self, then passes self.borrow() into the pipe function. Read more
Source§

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.borrow_mut() into the pipe function. Read more
Source§

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

Borrows self, then passes self.as_ref() into the pipe function.
Source§

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.as_mut() into the pipe function.
Source§

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

Borrows self, then passes self.deref() into the pipe function.
Source§

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.deref_mut() into the pipe function.
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> Tap for T

Source§

fn tap(self, func: impl FnOnce(&Self)) -> Self

Immutable access to a value. Read more
Source§

fn tap_mut(self, func: impl FnOnce(&mut Self)) -> Self

Mutable access to a value. Read more
Source§

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Immutable access to the Borrow<B> of a value. Read more
Source§

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Mutable access to the BorrowMut<B> of a value. Read more
Source§

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Immutable access to the AsRef<R> view of a value. Read more
Source§

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Mutable access to the AsMut<R> view of a value. Read more
Source§

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Immutable access to the Deref::Target of a value. Read more
Source§

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Mutable access to the Deref::Target of a value. Read more
Source§

fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self

Calls .tap() only in debug builds, and is erased in release builds.
Source§

fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self

Calls .tap_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Calls .tap_borrow() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Calls .tap_borrow_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Calls .tap_ref() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Calls .tap_ref_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Calls .tap_deref() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Calls .tap_deref_mut() only in debug builds, and is erased in release builds.
Source§

impl<T> TryConv for T

Source§

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

Attempts to convert self into T using TryInto<T>. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more