Skip to main content

StringInterner

Struct StringInterner 

Source
pub struct StringInterner { /* private fields */ }
Expand description

String interner for symbol name deduplication.

StringInterner stores strings efficiently by maintaining a single copy of each unique string. When the same string is interned multiple times, the same StringId is returned.

§Reference Counting

Each interned string has an associated reference count. This enables garbage collection of unused strings during compaction phases.

§Thread Safety

The interner uses Arc<str> for string storage, making it safe to share resolved strings across threads. However, the interner itself requires external synchronization (e.g., RwLock) for concurrent access.

§Example

let mut interner = StringInterner::new();

let id1 = interner.intern("foo");
let id2 = interner.intern("foo");
assert_eq!(id1, id2); // Same string → same ID

let resolved = interner.resolve(id1).unwrap();
assert_eq!(&*resolved, "foo");

Implementations§

Source§

impl StringInterner

Source

pub fn new() -> Self

Creates a new empty string interner.

Source

pub fn with_capacity(capacity: usize) -> Self

Creates a new interner with the specified capacity.

Source

pub fn with_max_ids(max_ids: u32) -> Self

Creates a new interner with a hard limit on the number of IDs.

This constructor is designed for testing error paths. It allows deterministic testing of InternError::CapacityExhausted handling without requiring billions of strings.

§Arguments
  • max_ids - Maximum number of unique strings that can be interned. Once this limit is reached, intern() will return InternError::CapacityExhausted.
§Example
// Create an interner that can only hold 3 strings
let mut interner = StringInterner::with_max_ids(3);

interner.intern("a").unwrap(); // OK
interner.intern("b").unwrap(); // OK
interner.intern("c").unwrap(); // OK
assert!(interner.intern("d").is_err()); // CapacityExhausted
Source

pub fn len(&self) -> usize

Returns the number of interned strings (excluding INVALID slot).

§Panics

Panics if the lookup is stale (bulk slots written without rebuild).

Source

pub fn is_empty(&self) -> bool

Returns true if no strings are interned.

§Panics

Panics if the lookup is stale (bulk slots written without rebuild).

Source

pub fn intern(&mut self, s: &str) -> Result<StringId, InternError>

Interns a string and returns its StringId.

If the string was already interned, returns the existing ID and increments its reference count. Otherwise, allocates a new ID.

§Errors

Returns InternError::CapacityExhausted if the interner has exhausted all available IDs (> 2^32 - 2 strings), or if max_ids is set and the limit has been reached.

§Panics

Panics if the lookup is stale and has not been rebuilt with build_dedup_table().

Source

pub fn intern_without_ref(&mut self, s: &str) -> Result<StringId, InternError>

Interns a string and returns its StringId without incrementing ref count.

This is useful when the string is being stored in a structure that will manage its own lifetime (e.g., node entry).

§Errors

Returns InternError::CapacityExhausted if the interner has exhausted all available IDs (> 2^32 - 2 strings), or if max_ids is set and the limit has been reached.

§Panics

Panics if the lookup is stale and has not been rebuilt with build_dedup_table().

Source

pub fn resolve(&self, id: StringId) -> Option<Arc<str>>

Resolves a StringId to its string value.

Returns None if the ID is invalid or has been recycled.

Source

pub fn ref_count(&self, id: StringId) -> u32

Returns the reference count for a string.

Returns 0 if the ID is invalid or has been recycled.

Source

pub fn inc_ref(&mut self, id: StringId) -> Option<u32>

Increments the reference count for a string.

Returns the new count, or None if the ID is invalid.

Source

pub fn dec_ref(&mut self, id: StringId) -> Option<u32>

Decrements the reference count for a string.

Returns the new count, or None if the ID is invalid. Note: This does NOT automatically recycle the string when count reaches 0. Use recycle_unreferenced() during compaction for that.

Source

pub fn recycle_unreferenced(&mut self) -> usize

Recycles all strings with zero reference count.

Returns the number of strings recycled. This should be called during compaction phases.

§Panics

Panics if the lookup is stale (bulk slots written without rebuild).

Source

pub fn contains(&self, s: &str) -> bool

Checks if a string is interned.

§Panics

Panics if the lookup is stale (bulk slots written without rebuild).

Source

pub fn get(&self, s: &str) -> Option<StringId>

Gets the StringId for a string if it’s already interned.

Unlike intern(), this does not create a new entry or modify ref counts.

§Panics

Panics if the lookup is stale (bulk slots written without rebuild).

Source

pub fn iter(&self) -> impl Iterator<Item = (StringId, &Arc<str>)>

Returns an iterator over all interned strings with their IDs.

Source

pub fn clear(&mut self)

Clears all interned strings.

Resets the interner to empty state, including clearing the lookup_stale flag (lookup is trivially consistent when empty).

Source

pub fn reserve(&mut self, additional: usize)

Reserves capacity for at least additional more strings.

Source

pub fn alloc_range(&mut self, count: u32) -> Result<u32, InternError>

Pre-allocates count string slots for bulk parallel commit.

The new slots are initialized with None (no string) and ref_count = 0. Returns the start index of the allocated range. The caller can then fill slots start..start+count via StringInterner::bulk_slices_mut.

This method does not touch the free_list — it always appends to the end of the strings and ref_counts vectors. This is intentional: during parallel commit, each file gets a contiguous, non-overlapping range.

§Errors

Returns InternError::CapacityExhausted if the allocation would exceed the LOCAL_TAG_BIT boundary (2^31 indices reserved for global IDs).

§Arguments
  • count - Number of slots to pre-allocate. If 0, this is a no-op returning the current length.
Source

pub fn bulk_slices_mut( &mut self, start: u32, count: u32, ) -> (&mut [Option<Arc<str>>], &mut [u32])

Returns mutable sub-slices into the strings and ref_counts arrays for the range start..start+count.

This enables parallel file commit workers to write directly into their pre-allocated range without contention. The caller is responsible for ensuring no overlapping ranges are accessed concurrently.

Defensively marks the lookup as stale when count > 0, since the returned slices allow direct mutation of string slots without updating the lookup HashMap.

§Panics

Panics if start + count exceeds the current vector length.

Source

pub fn build_dedup_table(&mut self) -> HashMap<StringId, StringId>

Scans all string slots and deduplicates identical strings.

After parallel commit, multiple file workers may have inserted the same string into different slots. This method:

  1. Iterates slots 1..N in index order (deterministic).
  2. For the first occurrence of each string value, that slot becomes the canonical entry.
  3. For duplicate occurrences, their ref_count is accumulated into the canonical slot, and the duplicate slot is cleared (None, ref_count = 0).
  4. The lookup HashMap is rebuilt from canonical entries only.

Returns a remap table mapping duplicate StringId to canonical StringId. Canonical entries are not included in the returned map.

Source

pub fn truncate_to(&mut self, saved_len: usize)

Truncates the strings and ref_counts vectors to saved_len.

This rolls back a failed bulk allocation by removing all slots at index saved_len and beyond. The lookup HashMap is not modified (the caller is responsible for ensuring no lookup entries point to the truncated region).

§Panics

Panics if saved_len is 0 (would remove the sentinel slot).

Source

pub fn string_count_raw(&self) -> usize

Returns the total number of string slots including the sentinel at index 0.

This is the raw vector length, not the number of interned strings. Useful for saving/restoring allocation state.

Source

pub fn is_lookup_stale(&self) -> bool

Returns whether the lookup HashMap is stale (bulk slots written without a build_dedup_table() rebuild).

This is primarily useful for testing and diagnostics.

Source

pub fn stats(&self) -> InternerStats

Returns statistics about the interner.

Safe to call even when lookup is stale — uses slot-based counting instead of lookup length.

Trait Implementations§

Source§

impl Clone for StringInterner

Source§

fn clone(&self) -> StringInterner

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for StringInterner

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for StringInterner

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for StringInterner

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Display for StringInterner

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Serialize for StringInterner

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<D> OwoColorize for D

Source§

fn fg<C>(&self) -> FgColorDisplay<'_, C, Self>
where C: Color,

Set the foreground color generically Read more
Source§

fn bg<C>(&self) -> BgColorDisplay<'_, C, Self>
where C: Color,

Set the background color generically. Read more
Source§

fn black(&self) -> FgColorDisplay<'_, Black, Self>

Change the foreground color to black
Source§

fn on_black(&self) -> BgColorDisplay<'_, Black, Self>

Change the background color to black
Source§

fn red(&self) -> FgColorDisplay<'_, Red, Self>

Change the foreground color to red
Source§

fn on_red(&self) -> BgColorDisplay<'_, Red, Self>

Change the background color to red
Source§

fn green(&self) -> FgColorDisplay<'_, Green, Self>

Change the foreground color to green
Source§

fn on_green(&self) -> BgColorDisplay<'_, Green, Self>

Change the background color to green
Source§

fn yellow(&self) -> FgColorDisplay<'_, Yellow, Self>

Change the foreground color to yellow
Source§

fn on_yellow(&self) -> BgColorDisplay<'_, Yellow, Self>

Change the background color to yellow
Source§

fn blue(&self) -> FgColorDisplay<'_, Blue, Self>

Change the foreground color to blue
Source§

fn on_blue(&self) -> BgColorDisplay<'_, Blue, Self>

Change the background color to blue
Source§

fn magenta(&self) -> FgColorDisplay<'_, Magenta, Self>

Change the foreground color to magenta
Source§

fn on_magenta(&self) -> BgColorDisplay<'_, Magenta, Self>

Change the background color to magenta
Source§

fn purple(&self) -> FgColorDisplay<'_, Magenta, Self>

Change the foreground color to purple
Source§

fn on_purple(&self) -> BgColorDisplay<'_, Magenta, Self>

Change the background color to purple
Source§

fn cyan(&self) -> FgColorDisplay<'_, Cyan, Self>

Change the foreground color to cyan
Source§

fn on_cyan(&self) -> BgColorDisplay<'_, Cyan, Self>

Change the background color to cyan
Source§

fn white(&self) -> FgColorDisplay<'_, White, Self>

Change the foreground color to white
Source§

fn on_white(&self) -> BgColorDisplay<'_, White, Self>

Change the background color to white
Source§

fn default_color(&self) -> FgColorDisplay<'_, Default, Self>

Change the foreground color to the terminal default
Source§

fn on_default_color(&self) -> BgColorDisplay<'_, Default, Self>

Change the background color to the terminal default
Source§

fn bright_black(&self) -> FgColorDisplay<'_, BrightBlack, Self>

Change the foreground color to bright black
Source§

fn on_bright_black(&self) -> BgColorDisplay<'_, BrightBlack, Self>

Change the background color to bright black
Source§

fn bright_red(&self) -> FgColorDisplay<'_, BrightRed, Self>

Change the foreground color to bright red
Source§

fn on_bright_red(&self) -> BgColorDisplay<'_, BrightRed, Self>

Change the background color to bright red
Source§

fn bright_green(&self) -> FgColorDisplay<'_, BrightGreen, Self>

Change the foreground color to bright green
Source§

fn on_bright_green(&self) -> BgColorDisplay<'_, BrightGreen, Self>

Change the background color to bright green
Source§

fn bright_yellow(&self) -> FgColorDisplay<'_, BrightYellow, Self>

Change the foreground color to bright yellow
Source§

fn on_bright_yellow(&self) -> BgColorDisplay<'_, BrightYellow, Self>

Change the background color to bright yellow
Source§

fn bright_blue(&self) -> FgColorDisplay<'_, BrightBlue, Self>

Change the foreground color to bright blue
Source§

fn on_bright_blue(&self) -> BgColorDisplay<'_, BrightBlue, Self>

Change the background color to bright blue
Source§

fn bright_magenta(&self) -> FgColorDisplay<'_, BrightMagenta, Self>

Change the foreground color to bright magenta
Source§

fn on_bright_magenta(&self) -> BgColorDisplay<'_, BrightMagenta, Self>

Change the background color to bright magenta
Source§

fn bright_purple(&self) -> FgColorDisplay<'_, BrightMagenta, Self>

Change the foreground color to bright purple
Source§

fn on_bright_purple(&self) -> BgColorDisplay<'_, BrightMagenta, Self>

Change the background color to bright purple
Source§

fn bright_cyan(&self) -> FgColorDisplay<'_, BrightCyan, Self>

Change the foreground color to bright cyan
Source§

fn on_bright_cyan(&self) -> BgColorDisplay<'_, BrightCyan, Self>

Change the background color to bright cyan
Source§

fn bright_white(&self) -> FgColorDisplay<'_, BrightWhite, Self>

Change the foreground color to bright white
Source§

fn on_bright_white(&self) -> BgColorDisplay<'_, BrightWhite, Self>

Change the background color to bright white
Source§

fn bold(&self) -> BoldDisplay<'_, Self>

Make the text bold
Source§

fn dimmed(&self) -> DimDisplay<'_, Self>

Make the text dim
Source§

fn italic(&self) -> ItalicDisplay<'_, Self>

Make the text italicized
Source§

fn underline(&self) -> UnderlineDisplay<'_, Self>

Make the text underlined
Make the text blink
Make the text blink (but fast!)
Source§

fn reversed(&self) -> ReversedDisplay<'_, Self>

Swap the foreground and background colors
Source§

fn hidden(&self) -> HiddenDisplay<'_, Self>

Hide the text
Source§

fn strikethrough(&self) -> StrikeThroughDisplay<'_, Self>

Cross out the text
Source§

fn color<Color>(&self, color: Color) -> FgDynColorDisplay<'_, Color, Self>
where Color: DynColor,

Set the foreground color at runtime. Only use if you do not know which color will be used at compile-time. If the color is constant, use either OwoColorize::fg or a color-specific method, such as OwoColorize::green, Read more
Source§

fn on_color<Color>(&self, color: Color) -> BgDynColorDisplay<'_, Color, Self>
where Color: DynColor,

Set the background color at runtime. Only use if you do not know what color to use at compile-time. If the color is constant, use either OwoColorize::bg or a color-specific method, such as OwoColorize::on_yellow, Read more
Source§

fn fg_rgb<const R: u8, const G: u8, const B: u8>( &self, ) -> FgColorDisplay<'_, CustomColor<R, G, B>, Self>

Set the foreground color to a specific RGB value.
Source§

fn bg_rgb<const R: u8, const G: u8, const B: u8>( &self, ) -> BgColorDisplay<'_, CustomColor<R, G, B>, Self>

Set the background color to a specific RGB value.
Source§

fn truecolor(&self, r: u8, g: u8, b: u8) -> FgDynColorDisplay<'_, Rgb, Self>

Sets the foreground color to an RGB value.
Source§

fn on_truecolor(&self, r: u8, g: u8, b: u8) -> BgDynColorDisplay<'_, Rgb, Self>

Sets the background color to an RGB value.
Source§

fn style(&self, style: Style) -> Styled<&Self>

Apply a runtime-determined style
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,