Asylum
A safe place for your strings.
asylum is a fast, lightweight, thread-safe string interner with deferred cleanup.
It stores each unique string once, returns cheap Symbol handles, supports fast identity-based equality checks, and can reclaim unused interned strings at deferred cleanup points.
It is intended for compilers, parsers, protocol implementations, and other workloads that repeatedly see the same strings, especially when many strings are short-lived.
Features
- Fast and lightweight: Designed for high-throughput applications.
- Memory-efficient: Only one copy of each string is stored.
- Cheap symbols:
Symbolis a small reference-counted handle to interned bytes. - Identity equality:
Symbolequality and hashing use interned identity instead of scanning string bytes. - Deferred cleanup: Final
Symboldrops stay cheap and trigger periodic shard-local cleanup. - Explicit sweeping:
collect_unused()removes entries with no liveSymbolhandles, whileshrink_to_fit()also releases spare capacity. - Simple API: Easy to integrate into any Rust project.
- Thread-safe: The global pool is sharded to reduce lock contention.
Internals
Under the hood, asylum uses a sharded global pool of hash sets. Each pool entry stores the interned bytes in a thin reference-counted allocation. The pool owns one reference, and every live Symbol owns another reference to the same entry.
When the last Symbol for a string is dropped, asylum records a pending cleanup on the affected shard instead of removing the entry immediately. Once enough final drops accumulate on that shard, asylum sweeps it and removes entries with no live Symbol handles. Calling collect_unused() explicitly sweeps every shard without shrinking retained capacity. Calling shrink_to_fit() performs the same cleanup and also shrinks retained hash-table capacity.
Explicit cleanup functions remove entries observable while each shard is locked. They are exact at quiescent points, when no concurrent interning or dropping is racing with the sweep.
Compared to interners such as ustr, which keep interned strings alive indefinitely, asylum is designed to reclaim unused strings and provide an explicit maintenance point for long-running applications.
Semantics
Symbol equality is identity-based. If two symbols are live handles to the same interned string, they compare equal without comparing string bytes. Comparisons with str and String compare contents.
Hash for Symbol is also identity-based. This makes HashSet<Symbol> and HashMap<Symbol, _> efficient, but it intentionally means hash(symbol) is not the same as hash(symbol.as_str()).
size() reports the number of entries currently stored in the global pool. It is not the number of live Symbol handles. capacity() reports retained hash-table capacity across all shards.
When to use
Use asylum when:
- You need fast pointer-based equality checks for strings.
- You expect many repeated or short-lived strings.
- You want best-effort deferred cleanup plus an explicit sweep for long-running applications.
- You can use a global process-wide interner.
Avoid asylum when:
- You need independent per-context interners.
- You need content-based
HashforSymbol. - You want interned strings to intentionally live for the full process lifetime.
How to use
use asylum;
Installation
Add asylum to your Cargo.toml:
[]
= "0.2"
Benchmarks
You can run benchmarks with Cargo:
The benchmark suite compares the current checkout with the previous published asylum release, ustr, and plain String allocation across short strings, duplicate-heavy strings, bounded <=64 byte strings, cleanup/drop costs, a long-string stress case, and a small contention workload.
The transient benchmarks are split by reset policy:
transient_reuse_capacity: currentasylumcallscollect_unused()between iterations, so unused entries are removed while shard capacity is retained. This models repeated transient batches in a running process. The previousasylumrelease removes final entries eagerly,ustrclears its cache between iterations, andStringallocates fresh owned strings.cold_from_empty: currentasylumcallsshrink_to_fit()between iterations, so each iteration starts from an empty pool with released capacity. The previousasylumrelease also callsshrink_to_fit(),ustrclears its cache, andStringallocates fresh owned strings.hot_lookup: each interner is pre-populated before measurement, then the benchmark measures repeated lookup/intern calls for already-seen strings.cleanup_drop: measures the cost of dropping the final handles after setup has interned the workload.hot_contention: measures repeated intern calls from multiple threads against a pre-populated workload.
Always benchmark with your own workload before choosing an interner. The main tradeoff is that asylum pays reference-counting and cleanup costs in exchange for reclaiming interned strings that are no longer used.
LICENSE
asylum is licensed under MIT terms.