lite-strtab
lite-strtab is a crate for storing many immutable strings in one buffer with minimal resource usage.
It is a simple, in-memory, build-once data structure:
- Push strings into a builder
- Finalize into an immutable table
- Look strings up by [
StringId]
As simple as that.
Design overview
Memory: one UTF-8 byte buffer plus one compact offset table; optional NUL-terminationCPU: cheap ID-based lookups (bounds check + two offset reads)Binary size: no panics on insertion, avoiding backtrace overhead
Offset and ID types are configurable to match your workload.
The common choice is O = u32 and I = u16.
Why this exists
[Note: Numbers are for 64-bit machines.]
For a companion blog post with additional design insights and real-world context, see Sometimes I Need to Store a Lot of Strings Efficiently, So I Built lite-strtab.
Types like [Box<[String]>] and [Box<[Box<str>]>] keep one handle per element:
- [
Box<[String]>]: 24 bytes (ptr + len + capacity) - [
Box<[Box<str>]>]: 16 bytes (ptr + len)
This is in addition to allocator overhead per string allocation (metadata + alignment).
In contrast, lite-strtab aims to remove these overheads by storing all strings
in a single buffer, with an offset table to define string boundaries.
- One raw alloc containing all UTF-8 bytes
- One offset table (
len + 1entries, with a final sentinel)
This removes per-string allocation overhead.
Rather than storing 16/24 bytes per string (+ allocation overhead), we just store
4 bytes per string (for [u32] offsets) + one final sentinel offset.
Installation
[]
= "0.1.0"
Feature flags
| Feature | Description |
|---|---|
std |
Enabled by default. The crate still uses #![no_std] + alloc internally. |
nightly |
Uses Rust's unstable allocator API instead of allocator-api2 and requires a nightly compiler (allocator_api). |
Basic usage
use StringTableBuilder;
let mut builder = new;
let hello = builder.try_push.unwrap;
let world = builder.try_push.unwrap;
let table = builder.build;
assert_eq!;
assert_eq!;
Choosing O and I
StringTableBuilder<O, I> has two size/capacity knobs:
O([Offset]) stores byte offsets into the shared UTF-8 buffer.- It limits total stored bytes.
- It costs
size_of::<O>()per string inside the [StringTable].
I([StringIndex], used by [StringId]) stores string IDs.- It limits string count.
- It costs
size_of::<I>()per stored ID field (table index) in your own structs.
Most users should start with O = u32, I = u16:
- Meaning about
4 GiBof UTF-8 data and64Kientries per table - Meaning: 2 bytes per
StringId(index into table) in your own structs- Comparison (64-bit):
Box<str>handle is 16 bytes,Stringis 24 bytes
- Comparison (64-bit):
Capacity quick-reference:
| Setting | Bytes | Max value | Practical meaning in this crate |
|---|---|---|---|
I = u8 |
1 | 255 |
Up to 256 strings per table |
I = u16 |
2 | 65,535 |
Up to 65,536 strings per table |
I = u32 |
4 | 4,294,967,295 |
Up to 4,294,967,296 strings per table |
O = u16 |
2 | 65,535 |
Up to 65,535 UTF-8 bytes total |
O = u32 |
4 | 4,294,967,295 |
Up to about 4 GiB UTF-8 bytes total |
Custom allocator
#
use ;
let mut builder = new_in;
let id = builder.try_push.unwrap;
let table = builder.build;
assert_eq!;
Custom O and I types
#
use ;
let mut builder = new_in;
let id = builder.try_push.unwrap;
let table = builder.build;
assert_eq!;
assert_eq!;
If you only want to change O, use StringTableBuilder::<u16>::new_in(Global)
and I keeps its default (u16).
Null-padded mode
Set NULL_PADDED = true to store strings with a trailing NUL byte:
use StringTableBuilder;
let mut builder = new_null_padded;
let id = builder.try_push.unwrap;
let table = builder.build;
assert_eq!; // NUL trimmed
assert_eq!; // raw bytes include NUL
Scope
This crate focuses on in-memory string storage only.
It does not do:
- serialization/deserialization
- compression/decompression
- sorting/deduplication policies
If you need those, build them in a wrapper around this crate.
Benchmarks
Memory usage was measured on Linux with glibc malloc using malloc_usable_size
to capture actual allocator block sizes including alignment and metadata overhead.
They can be captured with cargo run -p lite-strtab --features memory-report --bin memory_report.
How to read these tables:
Total=Heap allocations+Distributed fields+One-time metadataDistributed fields= string references distributed across fields/structs (e.g.String,Box<str>,StringId<u16>)- in these results,
lite-strtabusesStringId<u16>
Datasets
Three representative datasets were used:
- YakuzaKiwami: 4,650 game file paths (238,109 bytes), for example
sound/ja/some_file.awb. - EnvKeys: 109 environment variable names from an API specification (1,795 bytes).
- ApiUrls: 90 REST API endpoint URLs (3,970 bytes).
YakuzaKiwami (4650 entries, 238,109 bytes)
Summary
| Representation | Total | Heap allocations | Distributed fields | vs lite-strtab |
|---|---|---|---|---|
lite-strtab |
266068 (259.83 KiB) | 256736 (250.72 KiB) | 9300 (9.08 KiB) | 1.00x |
lite-strtab (null-padded) |
270708 (264.36 KiB) | 261376 (255.25 KiB) | 9300 (9.08 KiB) | 1.02x |
Vec<String> |
384240 (375.23 KiB) | 272640 (266.25 KiB) | 111600 (108.98 KiB) | 1.44x |
Box<[Box<str>]> |
346928 (338.80 KiB) | 272528 (266.14 KiB) | 74400 (72.66 KiB) | 1.30x |
Heap allocations (tree)
lite-strtab:256736 (250.72 KiB)(96.49%)StringTable<u32, u16>byte buffer:238120 (232.54 KiB)(92.75%of heap) - concatenated UTF-8 string payload dataStringTable<u32, u16>offsets buffer:18616 (18.18 KiB)(7.25%of heap) -u32offsets into the shared byte buffer
lite-strtab (null-padded):261376 (255.25 KiB)(96.55%)StringTable<u32, u16, true>byte buffer:242760 (237.07 KiB)(92.88%of heap) - concatenated UTF-8 string payload data with NUL terminatorsStringTable<u32, u16, true>offsets buffer:18616 (18.18 KiB)(7.12%of heap) -u32offsets into the shared byte buffer
Vec<String>:272640 (266.25 KiB)(70.96%)Stringpayload allocations:272640 (266.25 KiB)(100.00%of heap) - one UTF-8 allocation per string
Box<[Box<str>]>:272528 (266.14 KiB)(78.55%)Box<str>payload allocations:272528 (266.14 KiB)(100.00%of heap) - one UTF-8 allocation per string
Distributed fields (per-string handles)
lite-strtab:9300 (9.08 KiB)(3.50%) -StringId<u16>: field per string (2 Beach x4650)Vec<String>:111600 (108.98 KiB)(29.04%) -String: field per string (24 Beach x4650)Box<[Box<str>]>:74400 (72.66 KiB)(21.45%) -Box<str>: field per string (16 Beach x4650)
One-time metadata (table object itself)
lite-strtab:32 B(StringTable<u32, u16>struct itself; one per table, not per string)
EnvKeys (109 entries, 1,795 bytes)
Summary
| Representation | Total | Heap allocations | Distributed fields | vs lite-strtab |
|---|---|---|---|---|
lite-strtab |
2490 (2.43 KiB) | 2240 (2.19 KiB) | 218 B | 1.00x |
lite-strtab (null-padded) |
2602 (2.54 KiB) | 2352 (2.30 KiB) | 218 B | 1.04x |
Vec<String> |
5504 (5.38 KiB) | 2888 (2.82 KiB) | 2616 (2.55 KiB) | 2.21x |
Box<[Box<str>]> |
4472 (4.37 KiB) | 2728 (2.66 KiB) | 1744 (1.70 KiB) | 1.80x |
Heap allocations (tree)
lite-strtab:2240 (2.19 KiB)(89.96%)StringTable<u32, u16>byte buffer:1800 (1.76 KiB)(80.36%of heap) - concatenated UTF-8 string payload dataStringTable<u32, u16>offsets buffer:440 B(19.64%of heap) -u32offsets into the shared byte buffer
lite-strtab (null-padded):2352 (2.30 KiB)(90.39%)StringTable<u32, u16, true>byte buffer:1912 (1.87 KiB)(81.29%of heap) - concatenated UTF-8 string payload data with NUL terminatorsStringTable<u32, u16, true>offsets buffer:440 B(18.71%of heap) -u32offsets into the shared byte buffer
Vec<String>:2888 (2.82 KiB)(52.47%)Stringpayload allocations:2888 (2.82 KiB)(100.00%of heap) - one UTF-8 allocation per string
Box<[Box<str>]>:2728 (2.66 KiB)(61.00%)Box<str>payload allocations:2728 (2.66 KiB)(100.00%of heap) - one UTF-8 allocation per string
Distributed fields (per-string handles)
lite-strtab:218 B(8.76%) -StringId<u16>: field per string (2 Beach x109)Vec<String>:2616 (2.55 KiB)(47.53%) -String: field per string (24 Beach x109)Box<[Box<str>]>:1744 (1.70 KiB)(39.00%) -Box<str>: field per string (16 Beach x109)
One-time metadata (table object itself)
lite-strtab:32 B(StringTable<u32, u16>struct itself; one per table, not per string)
ApiUrls (90 entries, 3,970 bytes)
Summary
| Representation | Total | Heap allocations | Distributed fields | vs lite-strtab |
|---|---|---|---|---|
lite-strtab |
4564 (4.46 KiB) | 4352 (4.25 KiB) | 180 B | 1.00x |
lite-strtab (null-padded) |
4660 (4.55 KiB) | 4448 (4.34 KiB) | 180 B | 1.02x |
Vec<String> |
6896 (6.73 KiB) | 4736 (4.62 KiB) | 2160 (2.11 KiB) | 1.51x |
Box<[Box<str>]> |
6112 (5.97 KiB) | 4672 (4.56 KiB) | 1440 (1.41 KiB) | 1.34x |
Heap allocations (tree)
lite-strtab:4352 (4.25 KiB)(95.35%)StringTable<u32, u16>byte buffer:3976 (3.88 KiB)(91.36%of heap) - concatenated UTF-8 string payload dataStringTable<u32, u16>offsets buffer:376 B(8.64%of heap) -u32offsets into the shared byte buffer
lite-strtab (null-padded):4448 (4.34 KiB)(95.45%)StringTable<u32, u16, true>byte buffer:4072 (3.98 KiB)(91.55%of heap) - concatenated UTF-8 string payload data with NUL terminatorsStringTable<u32, u16, true>offsets buffer:376 B(8.45%of heap) -u32offsets into the shared byte buffer
Vec<String>:4736 (4.62 KiB)(68.68%)Stringpayload allocations:4736 (4.62 KiB)(100.00%of heap) - one UTF-8 allocation per string
Box<[Box<str>]>:4672 (4.56 KiB)(76.44%)Box<str>payload allocations:4672 (4.56 KiB)(100.00%of heap) - one UTF-8 allocation per string
Distributed fields (per-string handles)
lite-strtab:180 B(3.94%) -StringId<u16>: field per string (2 Beach x90)Vec<String>:2160 (2.11 KiB)(31.32%) -String: field per string (24 Beach x90)Box<[Box<str>]>:1440 (1.41 KiB)(23.56%) -Box<str>: field per string (16 Beach x90)
One-time metadata (table object itself)
lite-strtab:32 B(StringTable<u32, u16>struct itself; one per table, not per string)
Read performance (YakuzaKiwami)
In this benchmark we sequentially read all of the 4,650 strings (238,109 bytes).
By:
- Getting the
&strwithget/get_unchecked - Reading the
&strdata to compute a value (i.e. hashing).- This factors the other hidden costs such as memory alignment, etc.
AHash payload read (get / get_unchecked)
Hashing the data with AHash, a realistic real world workload.
| Access | Representation | avg time (µs) | avg thrpt (GiB/s) |
|---|---|---|---|
get |
Vec<String> |
13.561 | 16.352 |
get |
Box<[Box<str>]> |
13.002 | 17.056 |
get |
lite-strtab |
13.368 | 16.589 |
get |
lite-strtab (null-padded) |
13.714 | 16.171 |
get_unchecked |
Vec<String> |
13.448 | 16.490 |
get_unchecked |
Box<[Box<str>]> |
12.812 | 17.308 |
get_unchecked |
lite-strtab |
13.207 | 16.790 |
get_unchecked |
lite-strtab (null-padded) |
13.828 | 16.037 |
Byte-by-byte read (get_u8 / get_u8_unchecked)
Summing bytes one at a time.
| Access | Representation | avg time (µs) | avg thrpt (GiB/s) |
|---|---|---|---|
get_u8 |
Vec<String> |
18.979 | 11.684 |
get_u8 |
Box<[Box<str>]> |
18.778 | 11.809 |
get_u8 |
lite-strtab |
23.245 | 9.540 |
get_u8_unchecked |
Vec<String> |
18.928 | 11.716 |
get_u8_unchecked |
Box<[Box<str>]> |
18.861 | 11.758 |
get_u8_unchecked |
lite-strtab |
19.008 | 11.666 |
Chunked read (get_usize / get_usize_unchecked)
Reading data in usize chunks; then u8 for the remainder.
| Access | Representation | avg time (µs) | avg thrpt (GiB/s) |
|---|---|---|---|
get_usize |
Vec<String> |
8.219 | 26.982 |
get_usize |
Box<[Box<str>]> |
8.234 | 26.932 |
get_usize |
lite-strtab |
8.038 | 27.590 |
get_usize_unchecked |
Vec<String> |
8.167 | 27.154 |
get_usize_unchecked |
Box<[Box<str>]> |
8.402 | 26.393 |
get_usize_unchecked |
lite-strtab |
8.042 | 27.575 |
Iterator (iter / iter_u8 / iter_usize)
Using native iterators where available.
| Style | Representation | avg time (µs) | avg thrpt (GiB/s) |
|---|---|---|---|
ahash |
Vec<String> |
12.387 | 17.902 |
ahash |
Box<[Box<str>]> |
12.145 | 18.259 |
ahash |
lite-strtab |
12.897 | 17.195 |
ahash |
lite-strtab (null-padded) |
14.774 | 15.010 |
u8 |
Vec<String> |
17.998 | 12.321 |
u8 |
Box<[Box<str>]> |
17.916 | 12.378 |
u8 |
lite-strtab |
17.617 | 12.588 |
usize |
Vec<String> |
7.751 | 28.610 |
usize |
Box<[Box<str>]> |
7.845 | 28.268 |
usize |
lite-strtab |
7.588 | 29.226 |
Reproduce with cargo bench --bench my_benchmark. Linux glibc. cargo 1.95.0-nightly (fe2f314ae 2026-01-30).
In summary, actual read performance on real data is within margin of error.
The overhead of looking up a string by ID is negligible. Any difference you see is mostly due to run to run variation.
I've experimented with data alignment too, but saw no notable difference in practice
after aligning to usize boundaries to avoid reads across word boundaries.
There may be some in random access patterns; I've only benched sequential here.
Assembly comparison
Instruction count to get &str, x86_64, release mode:
| Method | Instructions | Access Pattern |
|---|---|---|
lite-strtab::get |
~12 | bounds check → load 2 offsets → compute range → add base |
lite-strtab::get_unchecked |
~7 | load 2 offsets → compute range → add base |
Vec<String>::get |
~8 | bounds check → load ptr from heap → deref for (ptr, len) |
Vec<String>::get_unchecked |
~5 | load ptr from heap → deref for (ptr, len) |
Box<[Box<str>]>::get |
~7 | bounds check → load ptr → deref for (ptr, len) |
Box<[Box<str>]>::get_unchecked |
~4 | load ptr → deref for (ptr, len) |
Overhead of processing the data largely dominates; so the difference here is negligible.
[^1]: RUSTFLAGS="-C target-cpu=native" cargo bench is ~80% faster on 9950X3D; relative differences unchanged.
License
MIT
[Box<[Box<str>]>]: alloc::boxed::Box
[Box<[String]>]: alloc::boxed::Box
[Offset]: crate::Offset
[StringId]: crate::StringId
[StringIndex]: crate::StringIndex
[StringTable]: crate::StringTable
[u16]: prim@u16
[u32]: prim@u32