Crate bitmaptrie [] [src]

Bitmappped Vector Trie

A bitmapped vector trie with node compression and a path cache. Values are always sorted by their index; thus iterating is always in index order.

The trie does not prescribe a length or capacity beside the range of values of it's index: usize. It could be used to compose a data structure that does behave more like Vec.

The branching factor is the word-size: 32 or 64. This makes the depth 6 for 32bit systems and 11 for 64bit systems. Because of the path cache, spatially dense indexes will not cause full depth traversal.

There is support for sharding the Trie such that each shard might be passed to a different thread for processing.

Performance improvements:

  • enable popcnt instruction when supported

Possible code improvements:

  • with better use of generics, some code duplication might be avoided

Basic Usage

use bitmaptrie::Trie;

let mut t: Trie<String> = Trie::new();

// set a key/value. Will overwrite any previous value at the index
t.set(123, String::from("testing 123"));

// look up a key returning a reference to the value if it exists
if let Some(ref value) = t.get(123) {
    println!("value = {}", *value);
}

// will remove the only entry
t.retain_if(|key, _value| key != 123);

Thread Safety

The trie can be borrowed in two ways: * For T: Send: mutable sharding, where each shard can safely be accessed mutably in it's own thread, allowing destructive updates. This is analagous to Vec::chunks(). * For T: Send + Sync: a Sync-safe borrow that can itself be sharded, but prevents destructive updates. This is analagous to a borrow of Vec<T: Send + Sync>.

Since the trie is borrowed and not shared using something like Arc, it follows that these methods will only work with scoped threading.

Sharding works by doing a breadth-first search into the trie to find the depth at which there are at least the number of interior nodes as requested. The returned number of nodes may be much greater than the requested number, or may be less if the trie is small.

As there is no knowledge of the balanced-ness of the trie, the more shards that are returned by borrow_sharded(), the more evenly the number of leaves on each shard will likely be distributed.

Mutable Sharding

use bitmaptrie::Trie;

let mut t: Trie<String> = Trie::new();
t.set(123, String::from("testing xyzzy"));

let mut shards = t.borrow_sharded(4); // shard at least 4 ways if possible
for mut shard in shards.drain() {
    // launch a scoped thread or submit a task to a queue here and move shard into it for
    // processing

    // destructive update to the trie, only touching this one shard
    shard.retain_if(|_, value| *value == String::from("testing 123"));
}

Sync-safe Borrow and Sharding

T in Trie<T> must be Sync in order to make value changes.

use bitmaptrie::Trie;

let mut t: Trie<usize> = Trie::new();
t.set(123, 246);

let num_threads = 4;

let shared = t.borrow_sync();
let shards = shared.borrow_sharded(num_threads);

for shard in shards.iter() {
    let shared = shared.clone();
    // launch a scoped thread here or submit a task to a queue and move shard and borrow into
    // it for processing

    // iterate over the shard's key/values
    for (_, value) in shard.iter() {
        if let Some(other) = shared.get(*value) {
            println!("found a cross reference");
        }
    }
}

Structs

BorrowShardImmut

Borrows a BorrowSync, splitting it into interior nodes each of which can be iterated over separately while still giving access to the full Trie.

BorrowShardMut

Type that borrows a Trie mutably, giving an iterable type that returns interior nodes on which structure-mutating operations can be performed. This can be used to parallelize destructive operations.

BorrowSync

Borrows a Trie exposing a Sync-type-safe subset of it's methods. No structure modifying methods are included. Each borrow gets it's own path cache. The lifetime of this type is the lifetime of the mutable borrow. This can be used to parallelize structure-immutant changes.

CompVec

A simple sparse vector. The valid word is a bitmap of which indeces have values. The maximum size of this vector is equal to the number of bits in a word (32 or 64).

CompVecIter

A type that implements Iterator for CompVec

CompVecIterMut

A type that implements Iterator for mutable values of CompVec

Iter

Iterator over (key, &T)s of Trie<T>

IterMut

Iterator over mutable (key, &mut T)s of Trie<T>

ShardImmut

Immutable borrow of an interior Trie node.

ShardMut

A type that references an interior Trie node. For splitting a Trie into sub-nodes, each of which can be passed to a different thread for mutable structural changes.

Trie

Path-cached bitmap trie type. The key is always a usize.

Enums

TrieNode

An interior (branch) or exterior (leaf) trie node, defined recursively.

Constants

USIZE_BYTES
VALID_MAX

First value to use in CompVec::next(masked__valid, ...)

WORD_SIZE