sinter
This crate exposes an interned string type, [IStr], through a fast
thread-safe global interning pool.
Interned strings are stored contiguously* in memory, which may help with memory locality and fragmentation in some circumstances. *Additional pages of memory for the interned strings are allocated as required, doubling in size with each successive page to amortise the cost of the underlying allocations.
Calling [intern] on a string that has already previously been interned is
fast (& lockless), though still more expensive than holding onto and re-using
an extant [IStr].
In the worst case [intern] can be somewhat expensive, since if the string
doesn't already exist in the pool some synchronisation with other threads is
required, and may require allocating a new memory page for the pool.
IStr
Zero-cost conversion to &'static str or &'static CStr:
# use ;
# use CStr;
let istr = new;
let s: &'static str = istr.as_str;
let cstr: &'static CStr = istr.as_c_str;
[IStr] Derefs to &str:
# use ;
# use CStr;
let istr: IStr = intern;
let s: &str = &*istr;
An [IStr] can be compared to another IStr extremely cheaply; under the hood
[Eq] is implemented by a single pointer comparison:
# use intern;
# use CStr;
let a = intern;
let a2 = intern;
let b = intern;
assert!;
assert!;
Or you can compare to a regular &str:
# use IStr;
assert!;
Flexible to construct:
# use ;
# use ;
let a = intern;
let b = new;
let c = from;
let d = from;
let e: IStr = "e".into;
let f = try_from.unwrap;
let g = try_from.unwrap;
# assert_eq!;
Find out if a given string has already been interned with [get_interned].
This will always be fast/lockless and returns the [IStr] if found:
# use ;
intern;
assert!;
assert!;
The [::core::ops::Deref] implementation gives you all the
useful &str methods & operations, such as subslicing:
# use IStr;
let hello_world = new;
let world: &str = &hello_world;
assert_eq!;
The [::core::borrow::Borrow<str>] implementation lets you create HashMaps
with IStr keys, and then ergonomically lookup values with &str:
# use IStr;
# use HashMap;
let mut map: = new;
map.insert;
let val = map.get;
# assert_eq!;
Architecture
Internally, a data structure known as the Interner manages the pool of
interned strings.
When writing a new string to the pool, the Interner acquires a lock on part of the backing store. This could be a somewhat slow operation if there is a lot of contention with other threads (although it would be unlikely).
On the other hand, the Interner uses lockless concurrency primitives to enable
readers (callers to intern that do not require allocating a new string, and
instead can fetch an existing IStr instance) to avoid locking entirely,
allowing superfluous calls to intern to still be very fast.
The concurrency scheme is as follows:
-
We maintain a linked-list of memory pages where the strings themselves are stored. New strings are appended strictly to the tail of the last memory page, and new pages are allocated as needed. This means all existing
IStrs have stable static memory locations and data. -
We maintain a pair of redundant hash tables mapping a string's hash to the
IStr(the pointer to the string data in the memory page), facilitating ~O(1) algorithmic complexity for interning strings. The tables are atomically swapped by the writer, allowing readers to safely get new updates without locking.When a thread wants to inspect the "readable table" they increment an atomic counter. They increment this counter again when they are done reading. This allows the writer to reliably wait on lingering reads after the atomic table swap.
If each thread's counter is even, then the writer knows they are not reading at all. If the thread's counter is odd, then the writer waits for the counter to increment at least once. Note, waiting for a single increment is sufficient and should be fairly quick (as reads are quick). After any increment the writer can be sure the reader will fetch the new table's pointer before starting a new read.
-
When a thread terminates it calls the destructor for the
LocalKeywhich contains a pointer to our epic atomic-counter. In this destructor we set the value of the epic to a special value (0) to mark this thread as dead. Later when some other code is holding the write_lock on the interner, it checks the list of epics to see if any threads are dead, and then frees the memory holding the atomic and removes that epic from Interner's list.This solves the small memory leak and degradation of performance that can happen if the user keeps spawning lots of temporary threads, without requiring that the LocalKey destructor waits until it can get the write_lock, and without leaving any dangling pointers in the Interner datastructure.
License
This crate is licensed under any of the Apache license, Version 2.0, or the MIT license, or the Zlib license at your option.
Unless explicitly stated otherwise, any contributions you intentionally submit for inclusion in this work shall be licensed accordingly.