Struct shawshank::Prison
[−]
[src]
pub struct Prison<O: StableAddress, I = usize, M = HashMap<&'static O::Target, I>> { /* fields omitted */ }
An efficient, generic internment structure.
Internals
When specialized for String
/str
, the structure looks like this:
use std::collections::HashMap; enum Slot<T> { Vacant(usize), Occupied(T), } pub struct StringPrison { map: HashMap<&'static str, usize>, interned: Vec<Slot<String>>, head: usize, }
A simpler structure that can be expressed entirely in safe Rust might use
HashMap<String, usize>
. The obvious drawback, however, is that each
interned string is stored twice. Since a slice stores a pointer to the heap
of a String
, however, and not to that of the Vec
, we can save space by
using them instead as keys. If we enforce the variant that slices in the
map are removed in lock-step with the strings in the vector, then it's safe
to lie to the compiler by changing the lifetime to 'static
.
This structure generalizes to all pairs of "owned" and "reference" types
where moving the "owned" doesn't invalidate the "reference." The
owning_ref
crate provides the trait StableAddress
to mark such types.
Other examples are Vec<T>
/[T]
and Box<T>
/T
, where T: Clone
.
head
contains the index of the first vacant slot, which in turn has the
index of the next, etc., effectively forming a linked list, with !0
as the
end-of-list marker. This allows vacant slots to be efficiently reclaimed
before appending to the vector. This is the same technique employed by
vec_arena
. Another name for Prison
could be ArenaSet
.
Custom ID Types
By default, the ID type parameter I
is usize
, the type of a Vec
index. One problem with usize
, of course, is lack of type safety. One
could wrap the IDs inside tuple structs, so that different domains share the
same Prison
. Some workloads, however, may have domains with disjoint
sets or much lower cardinality than usize
provides. In such cases,
more space can be saved by storing a smaller-than-usize
I
in the
internal map, and converting to and from usize
as needed. intern
returns Error::IdOverflow
if there are no more unique IDs available.
The ToPrimitive
/FromPrimitive
traits of the num
crate are used to
perform the conversions. If these ever return None
during an operation,
it will fail with Error::FromIdFailed
/Error::ToIdFailed
.
The custom_intern_id!
macro reduces the boilerplate to set thes up.
#[macro_use] extern crate shawshank; extern crate num; use shawshank::Error; // min/max optional; default to those of base type custom_intern_id!(Small, u8, 0, 3); fn main() { let mut p = shawshank::Builder::<String, Small>::new().hash().unwrap(); assert_eq!(p.intern("one"), Ok(Small(0))); assert_eq!(p.intern("two"), Ok(Small(1))); assert_eq!(p.intern("three"), Ok(Small(2))); assert_eq!(p.intern("four"), Ok(Small(3))); assert_eq!(p.intern("fail"), Err(Error::IdOverflow)); assert_eq!(p.disintern(Small(0)), Ok("one".into())); assert_eq!(p.intern("success"), Ok(Small(0))); }
Type Parameters
O
: The "owened" type of interned items (e.g.String
,Vec<T>
).I
: The "ID" type to uniquely resolve interned items.M
: The type used toMap
O::Target
s toI
s.
Methods
impl<O, I, M> Prison<O, I, M> where O: StableAddress, I: Bounded + ToPrimitive + FromPrimitive, M: Map
[src]
fn new() -> Result<Self, Error>
Create a new, empty Prison.
fn with_capacity(capacity: usize) -> Result<Self, Error>
Create a new, empty Prison with a capacity hint.
fn bounded_with_capacity(max_idx: usize, capacity: usize) -> Result<Self, Error>
Create a new, empty Prison with a specific maximum index and a capacity hint.
fn count(&self) -> usize
Get the number of interned items.
let mut p = shawshank::string_prison(); assert_eq!(p.intern("hello"), Ok(0)); assert_eq!(p.intern("world"), Ok(1)); assert_eq!(p.count(), 2); p.disintern(0).unwrap(); assert_eq!(p.count(), 1);
fn capacity(&self) -> usize
Get the capacity of the internal vector.
fn resolve<'a, U, Q: ?Sized>(&'a self, id: U) -> Result<&'a Q, Error> where U: Borrow<I>, O: Borrow<Q>
Resolve in item by its unique ID.
The success type is generic to both target and direct references.
use std::sync::Arc; let mut p = shawshank::byte_solitary(); assert_eq!(p.intern(vec![1,2,3]), Ok(0)); let s1: &Vec<u8> = p.resolve(0).unwrap(); let s2: &Arc<Vec<u8>> = p.resolve(0).unwrap();
Complexity: O(1)
impl<O, I, M> Prison<O, I, M> where O: StableAddress, O::Target: 'static, I: Copy + ToPrimitive + FromPrimitive + Bounded, M: Map<Key=&'static O::Target, Value=I>
[src]
fn intern<Q>(&mut self, item: Q) -> Result<I, Error> where Q: Borrow<O::Target>, O: From<Q>
Intern an item, receiving an ID that can later be used to resolve
the original.
If the item has already been interned, nothing changes, and the item's current ID
is returned. Barring any calls to shrink
, this will be the same ID returned
when the item was first interned.
item
is generic so that either a reference or owned type may be passed.
let mut p = shawshank::string_prison(); assert_eq!(p.intern("hello"), Ok(0)); assert_eq!(p.intern(String::from("hello")), Ok(0));
Complexity: O(max(M::get(K)
, M::insert(K,V)
))
fn disintern<'a, U: Borrow<I>>(&'a mut self, id: U) -> Result<O, Error>
Disintern an item by its unique ID.
Barring any calls to shrink
, all subsequent calls to resolve
with the ID
will fail. If the item is interned again, it will get a different ID.
let mut p = shawshank::string_prison(); assert_eq!(p.intern("hello"), Ok(0)); assert_eq!(p.intern("world"), Ok(1)); assert_eq!(p.disintern(0), Ok("hello".into())); assert_eq!(p.resolve::<_, str>(0), Err(shawshank::Error::InvalidId));
Complexity: O(M::remove(K)
)
fn shrink<T: Map<Key=I, Value=I>>(&mut self) -> T
Shrink the internal data structures by re-using ID of disinterned items. Returns a map from the old IDs to the new ones.
If an error occurs converting either the old or new index into a custom ID type, the item will be disinterned, an the resulting map will have no entry for the old ID.
use std::collections::BTreeMap; let mut p = shawshank::string_prison(); assert_eq!(p.intern("hello"), Ok(0)); assert_eq!(p.intern("world"), Ok(1)); assert_eq!(p.disintern(0), Ok("hello".into())); let remap: BTreeMap<_, _> = p.shrink(); assert_eq!(remap[&1], 0); assert_eq!(p.resolve(0), Ok("world"));
Complexity: O(successes * T::insert(K)
+ failures * M::remove(K)
)