asylum 0.2.0

A fast, lightweight string interner with deferred cleanup.
Documentation

Asylum

Crates.io Docs.rs License

A safe place for your strings.

asylum is a fast, lightweight, thread-safe string interner with deferred cleanup.

It stores each unique string once, returns cheap Symbol handles, supports fast identity-based equality checks, and can reclaim unused interned strings at deferred cleanup points.

It is intended for compilers, parsers, protocol implementations, and other workloads that repeatedly see the same strings, especially when many strings are short-lived.


Features

  • Fast and lightweight: Designed for high-throughput applications.
  • Memory-efficient: Only one copy of each string is stored.
  • Cheap symbols: Symbol is a small reference-counted handle to interned bytes.
  • Identity equality: Symbol equality and hashing use interned identity instead of scanning string bytes.
  • Deferred cleanup: Final Symbol drops stay cheap and trigger periodic shard-local cleanup.
  • Explicit sweeping: collect_unused() removes entries with no live Symbol handles, while shrink_to_fit() also releases spare capacity.
  • Simple API: Easy to integrate into any Rust project.
  • Thread-safe: The global pool is sharded to reduce lock contention.

Internals

Under the hood, asylum uses a sharded global pool of hash sets. Each pool entry stores the interned bytes in a thin reference-counted allocation. The pool owns one reference, and every live Symbol owns another reference to the same entry.

When the last Symbol for a string is dropped, asylum records a pending cleanup on the affected shard instead of removing the entry immediately. Once enough final drops accumulate on that shard, asylum sweeps it and removes entries with no live Symbol handles. Calling collect_unused() explicitly sweeps every shard without shrinking retained capacity. Calling shrink_to_fit() performs the same cleanup and also shrinks retained hash-table capacity.

Explicit cleanup functions remove entries observable while each shard is locked. They are exact at quiescent points, when no concurrent interning or dropping is racing with the sweep.

Compared to interners such as ustr, which keep interned strings alive indefinitely, asylum is designed to reclaim unused strings and provide an explicit maintenance point for long-running applications.

Semantics

Symbol equality is identity-based. If two symbols are live handles to the same interned string, they compare equal without comparing string bytes. Comparisons with str and String compare contents.

Hash for Symbol is also identity-based. This makes HashSet<Symbol> and HashMap<Symbol, _> efficient, but it intentionally means hash(symbol) is not the same as hash(symbol.as_str()).

size() reports the number of entries currently stored in the global pool. It is not the number of live Symbol handles. capacity() reports retained hash-table capacity across all shards.

When to use

Use asylum when:

  • You need fast pointer-based equality checks for strings.
  • You expect many repeated or short-lived strings.
  • You want best-effort deferred cleanup plus an explicit sweep for long-running applications.
  • You can use a global process-wide interner.

Avoid asylum when:

  • You need independent per-context interners.
  • You need content-based Hash for Symbol.
  • You want interned strings to intentionally live for the full process lifetime.

How to use

use asylum;

fn main() {
    // Intern strings into symbols.
    let hello1 = asylum::intern("hello");
    let hello2 = asylum::intern("hello");

    // Equal strings map to the same interned entry while live.
    assert_eq!(hello1, hello2);
    assert_eq!(asylum::size(), 1);

    // You can always read the interned string.
    assert_eq!(hello1.as_str(), "hello");
    assert_eq!(hello1, "hello");

    // Dropping one handle leaves the entry alive because another handle exists.
    drop(hello1);
    assert_eq!(hello2.count(), 1);
    assert_eq!(asylum::size(), 1);

    // Dropping the final handle can leave an unused entry until deferred
    // cleanup runs.
    drop(hello2);

    // Run the final cleanup at a quiescent point to remove entries with no
    // live Symbol handles and release retained capacity.
    asylum::shrink_to_fit();
    assert_eq!(asylum::size(), 0);
    assert_eq!(asylum::capacity(), 0);
}

Installation

Add asylum to your Cargo.toml:

[dependencies]
asylum = "0.2"

Benchmarks

You can run benchmarks with Cargo:

cargo bench

The benchmark suite compares the current checkout with the previous published asylum release, ustr, and plain String allocation across short strings, duplicate-heavy strings, bounded <=64 byte strings, cleanup/drop costs, a long-string stress case, and a small contention workload.

The transient benchmarks are split by reset policy:

  • transient_reuse_capacity: current asylum calls collect_unused() between iterations, so unused entries are removed while shard capacity is retained. This models repeated transient batches in a running process. The previous asylum release removes final entries eagerly, ustr clears its cache between iterations, and String allocates fresh owned strings.
  • cold_from_empty: current asylum calls shrink_to_fit() between iterations, so each iteration starts from an empty pool with released capacity. The previous asylum release also calls shrink_to_fit(), ustr clears its cache, and String allocates fresh owned strings.
  • hot_lookup: each interner is pre-populated before measurement, then the benchmark measures repeated lookup/intern calls for already-seen strings.
  • cleanup_drop: measures the cost of dropping the final handles after setup has interned the workload.
  • hot_contention: measures repeated intern calls from multiple threads against a pre-populated workload.

Always benchmark with your own workload before choosing an interner. The main tradeoff is that asylum pays reference-counting and cleanup costs in exchange for reclaiming interned strings that are no longer used.

LICENSE

asylum is licensed under MIT terms.