xgx_intern
A generic, high-performance, and customizable value interner for Rust.
Supports custom handles! Perfect for native64<-->wasm32 compatibility.
Overview
Value interning is a technique for deduplicating equal values to save memory and improve performance. An interner stores each unique value only once and provides a lightweight, copyable "handle" (or "symbol") to reference it.
This approach offers two main benefits:
- Memory Efficiency: If you have many duplicate values (e.g., strings in a compiler's AST, repeated keys in a dataset), interning ensures only one copy of each unique value is stored in memory.
- Performance Boost: Comparing two interned values becomes an extremely fast integer comparison (handle vs. handle) instead of a potentially expensive deep comparison (e.g., string vs. string).
xgx_intern provides a flexible and ergonomic implementation of this pattern, suitable for a wide range of applications.
Features
- Fully Generic: Works with any type that implements
Eq + Hash. - Customizable Hasher: Pluggable hashing algorithm via the
BuildHashertrait. Useahashorfxhashfor a significant speed boost in performance-critical code. - Customizable Handle: Choose the integer size for your handles (
u16,u32,u64, etc.) to perfectly balance memory usage with the expected number of unique items. - Ergonomic API: Offers
intern_owned,intern_ref, andintern_cowto handle different ownership scenarios efficiently and avoid unnecessary clones. - Float Support: Includes
HashableF32andHashableF64wrappers to enable reliable interning of floating-point numbers, which don't normally implementEqorHash. - Order Preserving: Built on
indexmap, the interner preserves the insertion order of unique values. - Export: Done interning values? Export the whole thing to a
Vec<T>for further simplicity and memory efficiency.
⚠️ WebAssembly Note: When compiling for a
wasm32target, it's critical that you use a handle size ofu32or smaller (u16,u8). Thewasm32architecture has a 32-bit pointer size (usize), so it cannot create handles from larger types likeu64, which would cause an error.
Installation
Add xgx_intern to your Cargo.toml:
[]
= "0.3" # make sure to use latest version
For improved performance, you can use a faster hasher like ahash:
[]
= "0.8" # make sure to use latest version
but make sure you understand the security and safety tradeoffs in your use case.
Usage
Example: Interning Strings
This is the most common use case. Here, we intern several strings and observe how duplicates are handled.
use RandomState;
use Interner;
// Create an interner for strings with the default hasher and u32 handles.
let mut interner = new;
// Intern some strings. `intern_ref` clones the data only if it's not already present.
let handle1 = interner.intern_ref.unwrap;
let handle2 = interner.intern_ref.unwrap;
let handle3 = interner.intern_ref.unwrap; // This is a duplicate
// Handles for identical values are guaranteed to be the same.
assert_eq!;
assert_ne!;
// Even though we interned three values, the interner only stores two unique strings.
assert_eq!;
// You can resolve a handle back to the original value for inspection.
assert_eq!;
println!;
// Output: Handle 0 resolved to 'hello'
Example: Interning a Custom Struct
Any type that implements Eq, PartialEq, Hash, and Clone (for intern_ref) can be interned.
use RandomState;
use Interner;
// 1. Define a custom type that can be interned.
// Deriving these traits is usually sufficient.
// 2. Create an interner for your custom type.
let mut interner = new;
// 3. Intern instances of your struct.
let user1 = User ;
let user2 = User ;
let user3 = User ; // A duplicate of user1
let h1 = interner.intern_ref.unwrap;
let h2 = interner.intern_ref.unwrap;
let h3 = interner.intern_ref.unwrap;
// Assert that the duplicate user gets the same handle.
assert_eq!;
assert_ne!;
assert_eq!;
// Resolve the handle to get a reference to the stored user.
let resolved_user = interner.resolve.unwrap;
println!;
// Output: Found user: User { id: 101, username: "alice" }
Customization
Using a Faster Hasher
The default RandomState hasher is secure but can be slow. For contexts where DOS resistance is not a concern, a faster non-cryptographic hasher like ahash is an excellent choice.
First, add ahash to your Cargo.toml:
[]
= "0.8"
Then, use its RandomState (which implements BuildHasher) to create the interner:
use RandomState;
use Interner;
// Create an interner that uses the fast `ahash` algorithm.
let mut interner = new;
let handle = interner.intern_owned.unwrap;
println!;
You can see more rust hash benchmarks here: Rust Hash Benchmarks.
Choosing a Handle Size
The default handle type H is u32, which allows for up to ~4.2 billion unique items. If you know you'll have fewer unique items, you can use a smaller handle type like u16 to save memory.
use RandomState;
use Interner;
// This interner uses u16 handles, limiting it to 65,536 unique items.
// This is perfect for smaller-scale problems and saves memory for each handle.
let mut interner = new;
// The returned handles will now be of type `u16`.
let handle: u16 = interner.intern_ref.unwrap;
assert_eq!;
Conversely, if you need more than u32::MAX items, you can use u64.
⚠️ WebAssembly Note: When compiling for a
wasm32target, it's critical that you use a handle size ofu32or smaller (u16,u8). Thewasm32architecture has a 32-bit pointer size (usize), so it cannot create handles from larger types likeu64, which would cause an error.
License
This project is licensed under the (LICENSE-MIT).