Module symtern::traits [−] [src]

Traits that define the interface for all string-interning implementations.

Interners are used to provide trivially-comparable stand-ins for values that are not trivially comparable, and to later retrieve the value represented by each stand-in. (If you don't need the original value after interning, you should probably be using hash functions instead.)

Terminology

This library uses the term "symbol" to refer to a stand-in value. Interners, which both intern and resolve symbols, are called pools (in reference to a "pool" of symbols) when discussing concrete interner/resolver implementations.

Symbol creation

Symbols are created by calling intern on Intern implementations.

// Nearly all functionality is in the trait implementations, so we simply
// glob-import the traits.
use symtern::prelude::*;
use symtern::Pool;
use symtern::adaptors::Inline;


let mut basic_pool = Pool::<str,u16>::new();
let cat = basic_pool.intern("Kibbles").expect("Failed to intern the cat");

let mut inline_pool = Inline::<Pool<str, u64>>::new();
let dog = inline_pool.intern("Fido").expect("Failed to intern the dog");

Symbol resolution

Symbols are resolved using the Resolve trait. In many cases you can simply pass the symbol to the interner's resolve method by value:

assert_eq!(Ok("Kibbles"), basic_pool.resolve(cat));

Some Resolve implementations, however, require a reference to the symbol:

assert_eq!(Ok("Fido"), inline_pool.resolve(&dog));

You can tell the difference by looking at the Input associated type on each Resolve implementation.

Choosing Chasing our Guarantees

Few people enjoy dealing with the fact that software is fallible and operations may fail, but doing so is often a necessary chore. Sometimes, however, we can eliminate potential error conditions through careful design of an API. How would we go about doing this for an interner library like Symtern? We'd like to require that

interning never fails, i.e. an interner grows as necessary to hold its values;
resolution never fails:

a. any attempt to resolve a symbol using an interner that did not create it results in a compile-time error; and

b. interners outlive or contain a copy of the original value, and outlive the created stand-in.

The first condition is untenable for the library's expected use case in combination with practical concerns of performance and efficiency, since a progressively larger address space will require progressively larger representations for symbols. We must handle the address-space-exhausted condition — especially if our symbols are 32 bits or fewer.

There is no way to enforce the second condition (2a) in the general case using features available in present-day Rust. We can restrict which type of wrong interner one can attempt to resolve the symbol on, but without something like Scala's path-dependent types we can't generally enforce the condition using compile-time checks.¹ What's worse, we can't even promise zero-cost run-time checks!

The third condition (2b), at least, can be satisfied using Rust's lifetime system. With a naive implementation, however, we find we can only create one stand-in object at a time!

struct MyInterner {
    // ...
}
struct Sym<'a> {
    marker: ::std::marker::PhantomData<&'a ()>,
    // ...
}
impl MyInterner {
    fn new() -> Self {
        // ...
    }
    fn intern<'a>(&'a mut self, s: &str) -> Sym<'a> {
        // ...
    }
}

fn main() {
    let mut interner = MyInterner::new();
    let x = interner.intern("x");
    let y = interner.intern("y");        //~ ERROR cannot borrow `interner` as mutable more than once at a time
}

This happens because rustc's borrowck sees that intern takes a mutable reference to the MyInterner instance and returns a symbol with the same lifetime, and infers that this symbol has (mutably) borrowed the interner through the reference. To fix this we can do one of the following.

Change intern to take &self and instead employ interior mutability through RefCell, Mutex, or similar.
Remove the lifetime from Sym, and lose any lifetime-safety guarantees.

Vacuous Promises and Viable Features

As we've just seen, we are going to be doing error handling for both symbol creation and symbol resolution unless we heavily restrict or change the ways in which the library can be used. But maybe we can employ the lifetime-safety guarantee in such a way that the benefits outweigh the burden of requiring implementations to use interior mutability.

One example is a method that clears (resets) an interner:

    fn intern<'a>(&'a self, s: &str) -> Sym<'a> { /* ... */ }
    fn clear(&mut self) { /* ... */ }

Because clear borrows self mutably, it cannot be called until all immutable borrows held by Sym<'a> instances have ended.

Footnotes

¹: We could produce symbols tied to the source interner, but the method of doing so would impose severe restrictions on the ways the library could be used. Using an approach called "generativity", symbols would be valid only within the scope of a closure passed as a second argument to intern , as exemplified in the indexing crate.

Traits

Intern	Primary interface for interner implementations. For a given type `T`, this type should implemented for `&'a T` or `&'a mut T`.
Len	Trait for use with interners can report the number of values they contain.
Resolve	Trait used to resolve symbols back to
ResolveUnchecked	Interface for resolvers that can provide faster symbol resolution at the expense of guaranteed safety.
Symbol	Trait bounds for symbol (interned stand-in value) types.
SymbolId	Trait describing primitive types used as symbols' internal representations.