1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
// Copyright (C) 2016 Symtern Project Contributors // // Licensed under the Apache License, Version 2.0 <LICENSE-Apache // or http://www.apache.org/licenses/LICENSE-2.0> or the MIT // license <LICENSE-MIT or http://opensource.org/licenses/MIT>, // at your option. This file may not be copied, modified, or // distributed except according to those terms. //! Traits that define the interface for all string-interning implementations. //! //! Interners are used to provide trivially-comparable stand-ins for values //! that are not trivially comparable, and to later retrieve the value //! represented by each stand-in. (If you don't need the original value after //! interning, you should probably be using hash functions instead.) //! //! ## Terminology //! //! This library uses the term "symbol" to refer to a stand-in value. //! Interners, which both intern and resolve symbols, are called pools (in //! reference to a "pool" of symbols) when discussing concrete //! interner/resolver implementations. //! //! ## Symbol creation //! //! Symbols are created by calling [`intern`] on [`Intern`] implementations. //! //! ```rust file="examples/create-resolve.rs" id="create" //! // Nearly all functionality is in the trait implementations, so we simply //! // glob-import the traits. //! use symtern::prelude::*; //! use symtern::Pool; //! use symtern::adaptors::Inline; //! //! //! let mut basic_pool = Pool::<str,u16>::new(); //! let cat = basic_pool.intern("Kibbles").expect("Failed to intern the cat"); //! //! let mut inline_pool = Inline::<Pool<str, u64>>::new(); //! let dog = inline_pool.intern("Fido").expect("Failed to intern the dog"); //! ``` //! //! ## Symbol resolution //! //! Symbols are resolved using the [`Resolve`] trait. In many cases you can //! simply pass the symbol to the interner's [`resolve`] method by value: //! //! ```rust,ignore file="examples/create-resolve.rs" id="resolve" //! assert_eq!(Ok("Kibbles"), basic_pool.resolve(cat)); //! ``` //! //! Some `Resolve` implementations, however, require a _reference_ to //! the symbol: //! //! ```rust,ignore file="examples/create-resolve.rs" id="resolve_ref" //! assert_eq!(Ok("Fido"), inline_pool.resolve(&dog)); //! ``` //! //! You can tell the difference by looking at the `Input` associated type on //! each `Resolve` implementation. //! //! ## <strike>Choosing</strike> Chasing our Guarantees //! //! Few people enjoy dealing with the fact that software is fallible and //! operations may fail, but doing so is often a necessary chore. Sometimes, //! however, we can eliminate potential error conditions through careful design //! of an API. How would we go about doing this for an interner library like //! Symtern? We'd like to require that //! //! 1. interning never fails, i.e. an interner grows as necessary to hold its //! values; //! //! 2. resolution never fails: //! //! a. any attempt to resolve a symbol using an interner that did not //! create it results in a compile-time error; and //! //! b. interners outlive or contain a copy of the original value, and outlive //! the created stand-in. //! //! The first condition is untenable for the library's expected use case in //! combination with practical concerns of performance and efficiency, since //! a progressively larger address space will require progressively larger //! representations for symbols. We *must* handle the address-space-exhausted //! condition — especially if our symbols are 32 bits or fewer. //! //! There is no way to enforce the second condition (2a) in the general case //! using features available in present-day Rust. We can restrict which _type_ //! of wrong interner one can attempt to resolve the symbol on, but without //! something like [Scala's path-dependent types] we can't generally enforce //! the condition using compile-time checks.[¹](#footnote-1) What's worse, we //! can't even promise zero-cost run-time checks! //! //! The third condition (2b), at least, can be satisfied using Rust's lifetime //! system. With a naive implementation, however, we find we can only create //! one stand-in object at a time! //! //! ```rust,ignore file="tests/compile-fail/naive-lifetime-safety.rs" //! struct MyInterner { //! // ... //! } //! struct Sym<'a> { //! marker: ::std::marker::PhantomData<&'a ()>, //! // ... //! } //! impl MyInterner { //! fn new() -> Self { //! // ... //! } //! fn intern<'a>(&'a mut self, s: &str) -> Sym<'a> { //! // ... //! } //! } //! //! fn main() { //! let mut interner = MyInterner::new(); //! let x = interner.intern("x"); //! let y = interner.intern("y"); //~ ERROR cannot borrow `interner` as mutable more than once at a time //! } //! ``` //! //! This happens because rustc's borrowck sees that `intern` takes a mutable //! reference to the MyInterner instance and returns a symbol with the same //! lifetime, and infers that this symbol has (mutably) borrowed the interner //! through the reference. To fix this we can do one of the following. //! //! * Change `intern` to take `&self` and instead employ interior mutability //! through `RefCell`, `Mutex`, or similar. //! //! * Remove the lifetime from `Sym`, and lose any lifetime-safety //! guarantees. //! //! ### Vacuous Promises and Viable Features //! //! As we've just seen, we *are* going to be doing error handling for both //! symbol creation *and* symbol resolution unless we heavily restrict or //! change the ways in which the library can be used. But maybe we can employ //! the lifetime-safety guarantee in such a way that the benefits outweigh the //! burden of requiring implementations to use interior mutability. //! //! One example is a method that clears (resets) an interner: //! //! ```rust,ignore //! fn intern<'a>(&'a self, s: &str) -> Sym<'a> { /* ... */ } //! fn clear(&mut self) { /* ... */ } //! ``` //! //! Because `clear` borrows `self` mutably, it cannot be called until all //! immutable borrows held by `Sym<'a>` instances have ended. //! //! ## Footnotes //! //! <a name="footnote-1">¹</a>: We *could* produce symbols tied to the source //! interner, but the method of doing so would impose severe restrictions on //! the ways the library could be used. Using an approach called //! "generativity", symbols would be valid only within the scope of a closure //! passed as a second argument to `intern` , as exemplified in the //! [indexing] crate. //! //! [indexing]: https://github.com/bluss/indexing //! [`intern`]: trait.Intern.html#tymethod.intern //! [`Intern`]: trait.Intern.html //! [`InternerMut`]: trait.InternerMut.html //! [`Resolve`]: trait.Resolve.html //! [`resolve`]: trait.Resolve.html#tymethod.resolve //! [Scala's path-dependent types]: http://danielwestheide.com/blog/2013/02/13/the-neophytes-guide-to-scala-part-13-path-dependent-types.html use std::hash::Hash; use ::num_traits::{Bounded, Unsigned, FromPrimitive, ToPrimitive}; use super::Result; // ---------------------------------------------------------------- /// Trait describing primitive types used as symbols' internal representations. pub trait SymbolId: Copy + Eq + Hash + Bounded + Unsigned + FromPrimitive + ToPrimitive {} impl<T> SymbolId for T where T: Copy + Eq + Hash + Bounded + Unsigned + FromPrimitive + ToPrimitive {} /// Trait bounds for symbol (interned stand-in value) types. pub trait Symbol: Copy + Eq + Hash {} impl<T> Symbol for T where T: Copy + Eq + Hash {} // ---------------------------------------------------------------- /// Primary interface for interner implementations. For a given type `T`, this /// type should implemented for `&'a T` or `&'a mut T`. pub trait Intern { /// Type of value accepted by `intern`. type Input: ?Sized; /// Type used to represent interned values. type Symbol: Symbol + ::sym::Symbol; /// Fetch the symbol that corresponds to the given value. If the value /// does not map to any existing symbol, create and return a new one. /// This method may return an error if the interner encounters any error /// while storing the value. /// /// ```rust,ignore file="examples/create-resolve.rs" id="intern-with-error-handling" /// let symbol = match some_interner.intern("Rosebud") { /// Ok(sym) => sym, /// Err(err) => return Err(MyErrorType::from(err)), /// }; /// ``` fn intern(self, value: &Self::Input) -> Result<Self::Symbol>; } // ---------------------------------------------------------------- /// Trait used to resolve symbols back to pub trait Resolve { /// Type used to represent interned values. type Input; /// Type stored by the interner and made available with `resolve`. type Output; /// Look up and return a reference to the value represented by a symbol, or /// an error if the symbol was not found. /// /// ```rust,ignore file="examples/create-resolve.rs" id="resolve-with-error-handling" /// let s = match some_pool.resolve(sym) { /// Ok(s) => s, /// Err(err) => return Err(MyErrorType::from(err)), /// }; /// ``` fn resolve(self, symbol: Self::Input) -> Result<Self::Output>; } /// Interface for resolvers that can provide faster symbol resolution at the /// expense of guaranteed safety. pub trait ResolveUnchecked: Resolve { /// Resolve the given symbol into its referent, bypassing any /// validity checks. /// /// ```rust,ignore file="examples/create-resolve.rs" id="resolve_unchecked" /// let mut pool = Pool::<str, u8>::new(); /// let sym = try!(pool.intern("abc")); /// /// assert_eq!("abc", unsafe { pool.resolve_unchecked(sym) }); /// ``` unsafe fn resolve_unchecked(self, symbol: Self::Input) -> Self::Output; } /// Trait for use with interners can report the number of values they contain. pub trait Len { /// Fetch the number of values contained in the interner. fn len(&self) -> usize; /// Check if the number of interned symbols has reached the /// maximum allowed. fn is_full(&self) -> bool; /// Check if the interner is "empty", i.e. has zero stored values. fn is_empty(&self) -> bool; }