Module symtern::traits
[−]
[src]
Traits that define the interface for all string-interning implementations.
Interners are used to provide trivially-comparable stand-ins for values that are not trivially comparable, and to later retrieve the value represented by each stand-in. (If you don't need the original value after interning, you should probably be using hash functions instead.)
Terminology
This library uses the term "symbol" to refer to a stand-in value. Interners, which both intern and resolve symbols, are called pools (in reference to a "pool" of symbols) when discussing concrete interner/resolver implementations.
Symbol creation
Symbols are created by calling intern
on Intern
implementations.
// Nearly all functionality is in the trait implementations, so we simply // glob-import the traits. use symtern::prelude::*; use symtern::Pool; use symtern::adaptors::Inline; let mut basic_pool = Pool::<str,u16>::new(); let cat = basic_pool.intern("Kibbles").expect("Failed to intern the cat"); let mut inline_pool = Inline::<Pool<str, u64>>::new(); let dog = inline_pool.intern("Fido").expect("Failed to intern the dog");
Symbol resolution
Symbols are resolved using the Resolve
trait. In many cases you can
simply pass the symbol to the interner's resolve
method by value:
assert_eq!(Ok("Kibbles"), basic_pool.resolve(cat));
Some Resolve
implementations, however, require a reference to
the symbol:
assert_eq!(Ok("Fido"), inline_pool.resolve(&dog));
You can tell the difference by looking at the Input
associated type on
each Resolve
implementation.
Choosing Chasing our Guarantees
Few people enjoy dealing with the fact that software is fallible and operations may fail, but doing so is often a necessary chore. Sometimes, however, we can eliminate potential error conditions through careful design of an API. How would we go about doing this for an interner library like Symtern? We'd like to require that
interning never fails, i.e. an interner grows as necessary to hold its values;
resolution never fails:
a. any attempt to resolve a symbol using an interner that did not create it results in a compile-time error; and
b. interners outlive or contain a copy of the original value, and outlive the created stand-in.
The first condition is untenable for the library's expected use case in combination with practical concerns of performance and efficiency, since a progressively larger address space will require progressively larger representations for symbols. We must handle the address-space-exhausted condition — especially if our symbols are 32 bits or fewer.
There is no way to enforce the second condition (2a) in the general case using features available in present-day Rust. We can restrict which type of wrong interner one can attempt to resolve the symbol on, but without something like Scala's path-dependent types we can't generally enforce the condition using compile-time checks.¹ What's worse, we can't even promise zero-cost run-time checks!
The third condition (2b), at least, can be satisfied using Rust's lifetime system. With a naive implementation, however, we find we can only create one stand-in object at a time!
struct MyInterner { // ... } struct Sym<'a> { marker: ::std::marker::PhantomData<&'a ()>, // ... } impl MyInterner { fn new() -> Self { // ... } fn intern<'a>(&'a mut self, s: &str) -> Sym<'a> { // ... } } fn main() { let mut interner = MyInterner::new(); let x = interner.intern("x"); let y = interner.intern("y"); //~ ERROR cannot borrow `interner` as mutable more than once at a time }
This happens because rustc's borrowck sees that intern
takes a mutable
reference to the MyInterner instance and returns a symbol with the same
lifetime, and infers that this symbol has (mutably) borrowed the interner
through the reference. To fix this we can do one of the following.
Change
intern
to take&self
and instead employ interior mutability throughRefCell
,Mutex
, or similar.Remove the lifetime from
Sym
, and lose any lifetime-safety guarantees.
Vacuous Promises and Viable Features
As we've just seen, we are going to be doing error handling for both symbol creation and symbol resolution unless we heavily restrict or change the ways in which the library can be used. But maybe we can employ the lifetime-safety guarantee in such a way that the benefits outweigh the burden of requiring implementations to use interior mutability.
One example is a method that clears (resets) an interner:
fn intern<'a>(&'a self, s: &str) -> Sym<'a> { /* ... */ } fn clear(&mut self) { /* ... */ }
Because clear
borrows self
mutably, it cannot be called until all
immutable borrows held by Sym<'a>
instances have ended.
Footnotes
¹: We could produce symbols tied to the source
interner, but the method of doing so would impose severe restrictions on
the ways the library could be used. Using an approach called
"generativity", symbols would be valid only within the scope of a closure
passed as a second argument to intern
, as exemplified in the
indexing crate.
Traits
Intern |
Primary interface for interner implementations. For a given type |
Len |
Trait for use with interners can report the number of values they contain. |
Resolve |
Trait used to resolve symbols back to |
ResolveUnchecked |
Interface for resolvers that can provide faster symbol resolution at the expense of guaranteed safety. |
Symbol |
Trait bounds for symbol (interned stand-in value) types. |
SymbolId |
Trait describing primitive types used as symbols' internal representations. |