lite-strtab 0.2.0

Crate for storing a lot of strings in a single buffer to save memory.
Documentation

lite-strtab

Crates.io Docs.rs CI

lite-strtab is a crate for storing many immutable strings in one buffer with minimal resource usage.

It is a simple, in-memory, build-once data structure:

  • Push strings into a builder
  • Finalize into an immutable table
  • Look strings up by [StringId]

As simple as that.

Design overview

  • Memory: one UTF-8 byte buffer plus one compact offset table; optional NUL-termination
  • CPU: cheap ID-based lookups (bounds check + two offset reads)
  • Binary size: no panics on insertion, avoiding backtrace overhead

Offset and ID types are configurable to match your workload. The common choice is O = u32 and I = u16.

Why this exists

[Note: Numbers are for 64-bit machines.]

For a companion blog post with additional design insights and real-world context, see Sometimes I Need to Store a Lot of Strings Efficiently, So I Built lite-strtab.

Types like [Box<[String]>] and [Box<[Box<str>]>] keep one handle per element:

  • [Box<[String]>]: 24 bytes (ptr + len + capacity)
  • [Box<[Box<str>]>]: 16 bytes (ptr + len)

This is in addition to allocator overhead per string allocation (metadata + alignment).

In contrast, lite-strtab aims to remove these overheads by storing all strings in a single buffer, with an offset table to define string boundaries.

  • One raw alloc containing all UTF-8 bytes
  • One offset table (len + 1 entries, with a final sentinel)

This removes per-string allocation overhead. Rather than storing 16/24 bytes per string (+ allocation overhead), we just store 4 bytes per string (for [u32] offsets) + one final sentinel offset.

Installation

[dependencies]
lite-strtab = "0.1.0"

Feature flags

Feature Description
std Enabled by default. The crate still uses #![no_std] + alloc internally.
nightly Uses Rust's unstable allocator API instead of allocator-api2 and requires a nightly compiler (allocator_api).

Basic usage

use lite_strtab::StringTableBuilder;

let mut builder = StringTableBuilder::new();
let hello = builder.try_push("hello").unwrap();
let world = builder.try_push("world").unwrap();

let table = builder.build();
assert_eq!(table.get(hello), Some("hello"));
assert_eq!(table.get(world), Some("world"));

Choosing O and I

StringTableBuilder<O, I> has two size/capacity knobs:

  • O ([Offset]) stores byte offsets into the shared UTF-8 buffer.
    • It limits total stored bytes.
    • It costs size_of::<O>() per string inside the [StringTable].
  • I ([StringIndex], used by [StringId]) stores string IDs.
    • It limits string count.
    • It costs size_of::<I>() per stored ID field (table index) in your own structs.

Most users should start with O = u32, I = u16:

  • Meaning about 4 GiB of UTF-8 data and 64Ki entries per table
  • Meaning: 2 bytes per StringId (index into table) in your own structs
    • Comparison (64-bit): Box<str> handle is 16 bytes, String is 24 bytes

Capacity quick-reference:

Setting Bytes Max value Practical meaning in this crate
I = u8 1 255 Up to 256 strings per table
I = u16 2 65,535 Up to 65,536 strings per table
I = u32 4 4,294,967,295 Up to 4,294,967,296 strings per table
O = u16 2 65,535 Up to 65,535 UTF-8 bytes total
O = u32 4 4,294,967,295 Up to about 4 GiB UTF-8 bytes total

Custom allocator

# #![cfg_attr(feature = "nightly", feature(allocator_api))]
use lite_strtab::{Global, StringTableBuilder};

let mut builder = StringTableBuilder::<u32>::new_in(Global);
let id = builder.try_push("example").unwrap();
let table = builder.build();

assert_eq!(table.get(id), Some("example"));

Custom O and I types

# #![cfg_attr(feature = "nightly", feature(allocator_api))]
use lite_strtab::{Global, StringTableBuilder};

let mut builder = StringTableBuilder::<u16, u8>::new_in(Global);
let id = builder.try_push("tiny-id").unwrap();
let table = builder.build();

assert_eq!(id.into_raw(), 0u8);
assert_eq!(table.get(id), Some("tiny-id"));

If you only want to change O, use StringTableBuilder::<u16>::new_in(Global) and I keeps its default (u16).

Null-padded mode

Set NULL_PADDED = true to store strings with a trailing NUL byte:

use lite_strtab::StringTableBuilder;

let mut builder = StringTableBuilder::new_null_padded();
let id = builder.try_push("hello").unwrap();
let table = builder.build();

assert_eq!(table.get(id), Some("hello"));   // NUL trimmed
assert_eq!(table.as_bytes(), b"hello\0");   // raw bytes include NUL

Scope

This crate focuses on in-memory string storage only.

It does not do:

  • serialization/deserialization
  • compression/decompression
  • sorting/deduplication policies

If you need those, build them in a wrapper around this crate.

Benchmarks

Memory usage was measured on Linux with glibc malloc using malloc_usable_size to capture actual allocator block sizes including alignment and metadata overhead.

They can be captured with cargo run -p lite-strtab --features memory-report --bin memory_report.

How to read these tables:

  • Total = Heap allocations + Distributed fields + One-time metadata
  • Distributed fields = string references distributed across fields/structs (e.g. String, Box<str>, StringId<u16>)
  • in these results, lite-strtab uses StringId<u16>

Datasets

Three representative datasets were used:

  • YakuzaKiwami: 4,650 game file paths (238,109 bytes), for example sound/ja/some_file.awb.
  • EnvKeys: 109 environment variable names from an API specification (1,795 bytes).
  • ApiUrls: 90 REST API endpoint URLs (3,970 bytes).

YakuzaKiwami (4650 entries, 238,109 bytes)

Summary

Representation Total Heap allocations Distributed fields vs lite-strtab
lite-strtab 266068 (259.83 KiB) 256736 (250.72 KiB) 9300 (9.08 KiB) 1.00x
lite-strtab (null-padded) 270708 (264.36 KiB) 261376 (255.25 KiB) 9300 (9.08 KiB) 1.02x
Vec<String> 384240 (375.23 KiB) 272640 (266.25 KiB) 111600 (108.98 KiB) 1.44x
Box<[Box<str>]> 346928 (338.80 KiB) 272528 (266.14 KiB) 74400 (72.66 KiB) 1.30x

Heap allocations (tree)

  • lite-strtab: 256736 (250.72 KiB) (96.49%)
    • StringTable<u32, u16> byte buffer: 238120 (232.54 KiB) (92.75% of heap) - concatenated UTF-8 string payload data
    • StringTable<u32, u16> offsets buffer: 18616 (18.18 KiB) (7.25% of heap) - u32 offsets into the shared byte buffer
  • lite-strtab (null-padded): 261376 (255.25 KiB) (96.55%)
    • StringTable<u32, u16, true> byte buffer: 242760 (237.07 KiB) (92.88% of heap) - concatenated UTF-8 string payload data with NUL terminators
    • StringTable<u32, u16, true> offsets buffer: 18616 (18.18 KiB) (7.12% of heap) - u32 offsets into the shared byte buffer
  • Vec<String>: 272640 (266.25 KiB) (70.96%)
    • String payload allocations: 272640 (266.25 KiB) (100.00% of heap) - one UTF-8 allocation per string
  • Box<[Box<str>]>: 272528 (266.14 KiB) (78.55%)
    • Box<str> payload allocations: 272528 (266.14 KiB) (100.00% of heap) - one UTF-8 allocation per string

Distributed fields (per-string handles)

  • lite-strtab: 9300 (9.08 KiB) (3.50%) - StringId<u16>: field per string (2 B each x 4650)
  • Vec<String>: 111600 (108.98 KiB) (29.04%) - String: field per string (24 B each x 4650)
  • Box<[Box<str>]>: 74400 (72.66 KiB) (21.45%) - Box<str>: field per string (16 B each x 4650)

One-time metadata (table object itself)

  • lite-strtab: 32 B (StringTable<u32, u16> struct itself; one per table, not per string)

EnvKeys (109 entries, 1,795 bytes)

Summary

Representation Total Heap allocations Distributed fields vs lite-strtab
lite-strtab 2490 (2.43 KiB) 2240 (2.19 KiB) 218 B 1.00x
lite-strtab (null-padded) 2602 (2.54 KiB) 2352 (2.30 KiB) 218 B 1.04x
Vec<String> 5504 (5.38 KiB) 2888 (2.82 KiB) 2616 (2.55 KiB) 2.21x
Box<[Box<str>]> 4472 (4.37 KiB) 2728 (2.66 KiB) 1744 (1.70 KiB) 1.80x

Heap allocations (tree)

  • lite-strtab: 2240 (2.19 KiB) (89.96%)
    • StringTable<u32, u16> byte buffer: 1800 (1.76 KiB) (80.36% of heap) - concatenated UTF-8 string payload data
    • StringTable<u32, u16> offsets buffer: 440 B (19.64% of heap) - u32 offsets into the shared byte buffer
  • lite-strtab (null-padded): 2352 (2.30 KiB) (90.39%)
    • StringTable<u32, u16, true> byte buffer: 1912 (1.87 KiB) (81.29% of heap) - concatenated UTF-8 string payload data with NUL terminators
    • StringTable<u32, u16, true> offsets buffer: 440 B (18.71% of heap) - u32 offsets into the shared byte buffer
  • Vec<String>: 2888 (2.82 KiB) (52.47%)
    • String payload allocations: 2888 (2.82 KiB) (100.00% of heap) - one UTF-8 allocation per string
  • Box<[Box<str>]>: 2728 (2.66 KiB) (61.00%)
    • Box<str> payload allocations: 2728 (2.66 KiB) (100.00% of heap) - one UTF-8 allocation per string

Distributed fields (per-string handles)

  • lite-strtab: 218 B (8.76%) - StringId<u16>: field per string (2 B each x 109)
  • Vec<String>: 2616 (2.55 KiB) (47.53%) - String: field per string (24 B each x 109)
  • Box<[Box<str>]>: 1744 (1.70 KiB) (39.00%) - Box<str>: field per string (16 B each x 109)

One-time metadata (table object itself)

  • lite-strtab: 32 B (StringTable<u32, u16> struct itself; one per table, not per string)

ApiUrls (90 entries, 3,970 bytes)

Summary

Representation Total Heap allocations Distributed fields vs lite-strtab
lite-strtab 4564 (4.46 KiB) 4352 (4.25 KiB) 180 B 1.00x
lite-strtab (null-padded) 4660 (4.55 KiB) 4448 (4.34 KiB) 180 B 1.02x
Vec<String> 6896 (6.73 KiB) 4736 (4.62 KiB) 2160 (2.11 KiB) 1.51x
Box<[Box<str>]> 6112 (5.97 KiB) 4672 (4.56 KiB) 1440 (1.41 KiB) 1.34x

Heap allocations (tree)

  • lite-strtab: 4352 (4.25 KiB) (95.35%)
    • StringTable<u32, u16> byte buffer: 3976 (3.88 KiB) (91.36% of heap) - concatenated UTF-8 string payload data
    • StringTable<u32, u16> offsets buffer: 376 B (8.64% of heap) - u32 offsets into the shared byte buffer
  • lite-strtab (null-padded): 4448 (4.34 KiB) (95.45%)
    • StringTable<u32, u16, true> byte buffer: 4072 (3.98 KiB) (91.55% of heap) - concatenated UTF-8 string payload data with NUL terminators
    • StringTable<u32, u16, true> offsets buffer: 376 B (8.45% of heap) - u32 offsets into the shared byte buffer
  • Vec<String>: 4736 (4.62 KiB) (68.68%)
    • String payload allocations: 4736 (4.62 KiB) (100.00% of heap) - one UTF-8 allocation per string
  • Box<[Box<str>]>: 4672 (4.56 KiB) (76.44%)
    • Box<str> payload allocations: 4672 (4.56 KiB) (100.00% of heap) - one UTF-8 allocation per string

Distributed fields (per-string handles)

  • lite-strtab: 180 B (3.94%) - StringId<u16>: field per string (2 B each x 90)
  • Vec<String>: 2160 (2.11 KiB) (31.32%) - String: field per string (24 B each x 90)
  • Box<[Box<str>]>: 1440 (1.41 KiB) (23.56%) - Box<str>: field per string (16 B each x 90)

One-time metadata (table object itself)

  • lite-strtab: 32 B (StringTable<u32, u16> struct itself; one per table, not per string)

Read performance (YakuzaKiwami)

In this benchmark we sequentially read all of the 4,650 strings (238,109 bytes).

By:

  • Getting the &str with get / get_unchecked
  • Reading the &str data to compute a value (i.e. hashing).
    • This factors the other hidden costs such as memory alignment, etc.

AHash payload read (get / get_unchecked)

Hashing the data with AHash, a realistic real world workload.

Access Representation avg time (µs) avg thrpt (GiB/s)
get Vec<String> 13.561 16.352
get Box<[Box<str>]> 13.002 17.056
get lite-strtab 13.368 16.589
get lite-strtab (null-padded) 13.714 16.171
get_unchecked Vec<String> 13.448 16.490
get_unchecked Box<[Box<str>]> 12.812 17.308
get_unchecked lite-strtab 13.207 16.790
get_unchecked lite-strtab (null-padded) 13.828 16.037

Byte-by-byte read (get_u8 / get_u8_unchecked)

Summing bytes one at a time.

Access Representation avg time (µs) avg thrpt (GiB/s)
get_u8 Vec<String> 18.979 11.684
get_u8 Box<[Box<str>]> 18.778 11.809
get_u8 lite-strtab 23.245 9.540
get_u8_unchecked Vec<String> 18.928 11.716
get_u8_unchecked Box<[Box<str>]> 18.861 11.758
get_u8_unchecked lite-strtab 19.008 11.666

Chunked read (get_usize / get_usize_unchecked)

Reading data in usize chunks; then u8 for the remainder.

Access Representation avg time (µs) avg thrpt (GiB/s)
get_usize Vec<String> 8.219 26.982
get_usize Box<[Box<str>]> 8.234 26.932
get_usize lite-strtab 8.038 27.590
get_usize_unchecked Vec<String> 8.167 27.154
get_usize_unchecked Box<[Box<str>]> 8.402 26.393
get_usize_unchecked lite-strtab 8.042 27.575

Iterator (iter / iter_u8 / iter_usize)

Using native iterators where available.

Style Representation avg time (µs) avg thrpt (GiB/s)
ahash Vec<String> 12.387 17.902
ahash Box<[Box<str>]> 12.145 18.259
ahash lite-strtab 12.897 17.195
ahash lite-strtab (null-padded) 14.774 15.010
u8 Vec<String> 17.998 12.321
u8 Box<[Box<str>]> 17.916 12.378
u8 lite-strtab 17.617 12.588
usize Vec<String> 7.751 28.610
usize Box<[Box<str>]> 7.845 28.268
usize lite-strtab 7.588 29.226

Reproduce with cargo bench --bench my_benchmark. Linux glibc. cargo 1.95.0-nightly (fe2f314ae 2026-01-30).

In summary, actual read performance on real data is within margin of error.

The overhead of looking up a string by ID is negligible. Any difference you see is mostly due to run to run variation.

I've experimented with data alignment too, but saw no notable difference in practice after aligning to usize boundaries to avoid reads across word boundaries. There may be some in random access patterns; I've only benched sequential here.

Assembly comparison

Instruction count to get &str, x86_64, release mode:

Method Instructions Access Pattern
lite-strtab::get ~12 bounds check → load 2 offsets → compute range → add base
lite-strtab::get_unchecked ~7 load 2 offsets → compute range → add base
Vec<String>::get ~8 bounds check → load ptr from heap → deref for (ptr, len)
Vec<String>::get_unchecked ~5 load ptr from heap → deref for (ptr, len)
Box<[Box<str>]>::get ~7 bounds check → load ptr → deref for (ptr, len)
Box<[Box<str>]>::get_unchecked ~4 load ptr → deref for (ptr, len)

Overhead of processing the data largely dominates; so the difference here is negligible.

[^1]: RUSTFLAGS="-C target-cpu=native" cargo bench is ~80% faster on 9950X3D; relative differences unchanged.

License

MIT

[Box<[Box<str>]>]: alloc::boxed::Box [Box<[String]>]: alloc::boxed::Box [Offset]: crate::Offset [StringId]: crate::StringId [StringIndex]: crate::StringIndex [StringTable]: crate::StringTable [u16]: prim@u16 [u32]: prim@u32