Struct shawshank::ArenaSet [] [src]

pub struct ArenaSet<O: StableDeref, I = usize, M = HashMap<&'static <O as Deref>::Target, I>> { /* fields omitted */ }

An efficient, generic internment structure.

Internals

When specialized for String/str, the structure looks like this:

use std::collections::HashMap;

enum Slot<T> {
    Vacant(usize),
    Occupied(T),
}
pub struct StringArenaSet {
    map: HashMap<&'static str, usize>,
    interned: Vec<Slot<String>>,
    head: usize,
}

A simpler structure that can be expressed entirely in safe Rust might use HashMap<String, usize>. The obvious drawback, however, is that each interned string is stored twice. Since a slice stores a pointer to the heap of a String, however, and not to that of the Vec, we can save space by using them instead as keys. If we enforce the variant that slices in the map are removed in lock-step with the strings in the vector, then it's safe to lie to the compiler by changing the lifetime to 'static.

This structure generalizes to all pairs of "owned" and "reference" types where moving the "owned" doesn't invalidate the "reference." The stable_deref_trait crate provides the trait StableDeref to mark such types. Other examples are Vec<T>/[T] and Box<T>/T, where T: Clone.

head contains the index of the first vacant slot, which in turn has the index of the next, etc., effectively forming a linked list, with !0 as the end-of-list marker. This allows vacant slots to be efficiently reclaimed before appending to the vector. This is the same technique employed by vec_arena.

Custom ID Types

By default, the ID type parameter I is usize, the type of a Vec index. One problem with usize, of course, is lack of type safety. One could wrap the IDs inside tuple structs, so that different domains share the same ArenaSet. Some workloads, however, may have domains with disjoint sets or much lower cardinality than usize provides. In such cases, more space can be saved by storing a smaller-than-usize I in the internal map, and converting to and from usize as needed. intern returns Error::IdOverflow if there are no more unique IDs available.

The ToPrimitive/FromPrimitive traits of the num-traits crate are used to perform the conversions. If these ever return None during an operation, it will fail with Error::FromIdFailed/Error::ToIdFailed.

The custom_intern_id! macro reduces the boilerplate to set thes up.

#[macro_use] extern crate shawshank;
extern crate num_traits;

use shawshank::Error;

// min/max optional; default to those of base type
custom_intern_id!(Small, u8, 0, 3);

fn main() {
    let mut p = shawshank::Builder::<String, Small>::new().hash().unwrap();
    assert_eq!(p.intern("one"), Ok(Small(0)));
    assert_eq!(p.intern("two"), Ok(Small(1)));
    assert_eq!(p.intern("three"), Ok(Small(2)));
    assert_eq!(p.intern("four"), Ok(Small(3)));
    assert_eq!(p.intern("fail"), Err(Error::IdOverflow));
    assert_eq!(p.disintern(Small(0)), Ok("one".into()));
    assert_eq!(p.intern("success"), Ok(Small(0)));
}

Type Parameters

  • O: The "owened" type of interned items (e.g. String, Vec<T>).
  • I: The "ID" type to uniquely resolve interned items.
  • M: The type used to Map O::Targets to Is.

Methods

impl<O, I, M> ArenaSet<O, I, M> where
    O: StableDeref,
    I: Bounded + ToPrimitive + FromPrimitive,
    M: Map
[src]

Create a new, empty ArenaSet.

Create a new, empty ArenaSet with a capacity hint.

Create a new, empty ArenaSet with a specific maximum index and a capacity hint.

Get the number of interned items.

let mut p = shawshank::string_arena_set();
assert_eq!(p.intern("hello"), Ok(0));
assert_eq!(p.intern("world"), Ok(1));
assert_eq!(p.count(), 2);
p.disintern(0).unwrap();
assert_eq!(p.count(), 1);

Get the capacity of the internal vector.

Resolve in item by its unique ID.

The success type is generic to both target and direct references.

use std::sync::Arc;

let mut p = shawshank::byte_stadium_set();
assert_eq!(p.intern(vec![1,2,3]), Ok(0));
let s1: &Vec<u8> = p.resolve(0).unwrap();
let s2: &Arc<Vec<u8>> = p.resolve(0).unwrap();

Complexity: O(1)

impl<O, I, M> ArenaSet<O, I, M> where
    O: StableDeref,
    O::Target: 'static,
    I: Copy + ToPrimitive + FromPrimitive + Bounded,
    M: Map<Key = &'static O::Target, Value = I>, 
[src]

Intern an item, receiving an ID that can later be used to resolve the original.

If the item has already been interned, nothing changes, and the item's current ID is returned. Barring any calls to shrink, this will be the same ID returned when the item was first interned.

item is generic so that either a reference or owned type may be passed.

let mut p = shawshank::string_arena_set();
assert_eq!(p.intern("hello"), Ok(0));
assert_eq!(p.intern(String::from("hello")), Ok(0));

Complexity: O(max(M::get(K), M::insert(K,V)))

Disintern an item by its unique ID.

Barring any calls to shrink, all subsequent calls to resolve with the ID will fail. If the item is interned again, it will get a different ID.

let mut p = shawshank::string_arena_set();
assert_eq!(p.intern("hello"), Ok(0));
assert_eq!(p.intern("world"), Ok(1));
assert_eq!(p.disintern(0), Ok("hello".into()));
assert_eq!(p.resolve::<_, str>(0), Err(shawshank::Error::InvalidId));

Complexity: O(M::remove(K))

Shrink the internal data structures by re-using ID of disinterned items. Returns a map from the old IDs to the new ones.

If an error occurs converting either the old or new index into a custom ID type, the item will be disinterned, an the resulting map will have no entry for the old ID.

use std::collections::BTreeMap;

let mut p = shawshank::string_arena_set();
assert_eq!(p.intern("hello"), Ok(0));
assert_eq!(p.intern("world"), Ok(1));
assert_eq!(p.disintern(0), Ok("hello".into()));
let remap: BTreeMap<_, _> = p.shrink();
assert_eq!(remap[&1], 0);
assert_eq!(p.resolve(0), Ok("world"));

Complexity: O(successes * T::insert(K) + failures * M::remove(K))