Crate block_array_cow [] [src]

Array storage to minimize duplication.

This is done by splitting arrays into chunks and using copy-on-write (COW), to de-duplicate chunks, from the users perspective this is an implementation detail.


Data Structure

This diagram is an overview of the structure of a single array-store.

note: The only 2 structures here which are referenced externally are the.

  • BArrayStore: The whole array store.
  • BArrayState: Represents a single state (array) of data. These can be add using a reference state, while this could be considered the previous or parent state. no relationship is kept, so the caller is free to add any state from the same BArrayStore as a reference.
<+> BArrayStore: root data-structure,
 |  can store many 'states', which share memory.
 |  This can store many arrays, however they must share the same 'stride'.
 |  Arrays of different types will need to use a new BArrayStore.
 +- <+> states (Collection of BArrayState's):
 |   |  Each represents an array added by the user of this API.
 |   |  and references a chunk_list (each state is a chunk_list user).
 |   |  Note that the list order has no significance.
 |   |
 |   +- <+> chunk_list (BChunkList):
 |       |  The chunks that make up this state.
 |       |  Each state is a chunk_list user,
 |       |  avoids duplicating lists when there is no change between states.
 |       |
 |       +- chunk_refs (List of BChunkRef): Each chunk_ref links to a a BChunk.
 |          Each reference is a chunk user,
 |          avoids duplicating smaller chunks of memory found in multiple states.
 +- info (BArrayInfo):
 |  Sizes and offsets for this array-store.
 |  Also caches some variables for reuse.
 +- <+> memory (BArrayMemory):
     |  Memory pools for storing BArrayStore data.
     +- chunk_list (Pool of BChunkList):
     |  All chunk_lists, (reference counted, used by BArrayState).
     +- chunk_ref (Pool of BChunkRef):
     |  All chunk_refs (link between BChunkList & BChunk).
     +- chunks (Pool of BChunk):
        All chunks, (reference counted, used by BChunkList).
        These have their headers hashed for reuse so we can quickly check for duplicates.


When creating a new state, a previous state can be given as a reference, matching chunks from this state are re-used in the new state.

First matches at either end of the array are detected. For identical arrays this is all thats needed.

De-duplication is performed on any remaining chunks, by hashing the first few bytes of the chunk (see: BCHUNK_HASH_TABLE_ACCUMULATE_STEPS).

\note This is cached for reuse since the referenced data never changes.

An array is created to store hash values at every 'stride', then stepped over to search for matching chunks.

Once a match is found, there is a high chance next chunks match too, so this is checked to avoid performing so many hash-lookups. Otherwise new chunks are created.


let mut bs = block_array_cow::BArrayStore::new(1, 8);
let data_src_a = b"The quick brown fox jumps over the lazy dog";
let data_src_b = b"The quick brown fox almost jumps over the lazy dog";
let data_src_c = b"The little quick brown fox jumps over the lazy dog!";

let state_a = bs.state_add(data_src_a, None);
let state_b = bs.state_add(data_src_b, Some(state_a));
let state_c = bs.state_add(data_src_c, Some(state_b));

// Check the data is stored correctly
let data_dst = block_array_cow::BArrayStore::state_data_get_alloc(state_a);
assert_eq!(&data_src_a[..], &data_dst[..]);

let data_dst = block_array_cow::BArrayStore::state_data_get_alloc(state_b);
assert_eq!(&data_src_b[..], &data_dst[..]);

let data_dst = block_array_cow::BArrayStore::state_data_get_alloc(state_c);
assert_eq!(&data_src_c[..], &data_dst[..]);



A single instance of an array.


Main storage for all states