Skip to main content

Crate cold_string

Crate cold_string 

Source
Expand description

§cold-string

Github Crates.io docs.rs MSRV

A 1-word (8-byte) sized representation of immutable UTF-8 strings that in-lines up to 8 bytes. Optimized for memory usage and struct packing.

§Overview

ColdString minimizes per-string overhead for both short and large strings.

  • Strings ≤ 8 bytes: 8 bytes total
  • Larger strings: ~9–10 bytes overhead (other string libraries have 24 bytes per value)

This leads to substantial memory savings over both String and other short-string crates (see Memory Comparison (System RSS)):

  • 35% – 67% smaller than String in HashSet
  • 35% – 64% smaller than other short-string crates in HashSet
  • 30% – 75% smaller than String in BTreeSet
  • 13% – 63% smaller than other short-string crates in BTreeSet

§Portability

ColdString’s MSRV is 1.60, is no_std compatible, and is a drop in replacement for immutable Strings.

§Usage

Use it like a String:

use cold_string::ColdString;

let s = ColdString::new("qwerty");
assert_eq!(s.as_str(), "qwerty");

Packs well with other types:

use cold_string::ColdString;
use std::mem::{align_of, size_of};

assert_eq!(size_of::<ColdString>(), size_of::<usize>());
assert_eq!(align_of::<ColdString>(), 1);

assert_eq!(size_of::<(ColdString, u8)>(), size_of::<usize>() + 1);
assert_eq!(size_of::<Option<ColdString>>(), size_of::<usize>() + 1);

§How It Works

ColdString is an 8-byte tagged pointer (4 bytes on 32-bit machines):

#[repr(packed)]
pub struct ColdString {
    encoded: *mut u8,
}

The 8 bytes encode one of three representations indicated by the 1st byte:

  • 10xxxxxx: encoded contains a tagged heap pointer. To decode the address, clear the tag bits (10 → 00) and rotate so the 00 bits become the least-significant bits. The heap allocation uses 4-byte alignment, guaranteeing the least-significant 2 bits of the address are 00. On the heap, the UTF-8 characters are preceded by the variable-length encoding of the size. The size uses 1 byte for 0 - 127, 2 bytes for 128 - 16383, etc.
  • 11111xxx: xxx is the length and the remaining 0-7 bytes are UTF-8 characters.
  • xxxxxxxx: All 8 bytes are UTF-8.

10xxxxxx and 11111xxx are chosen because they cannot be valid first bytes of UTF-8.

§Why “Cold”?

The heap representation stores the length on the heap, not inline in the struct. This saves memory in the struct itself but slightly increases the cost of len() since it requires a heap read. In practice, the len() cost is only marginally slower than inline storage and is typically negligible compared to memory savings, cache density improvements, and 3x faster operations on inlined strings.

§Safety

ColdString uses unsafe to implement its packed representation and pointer tagging. Usage of unsafe is narrowly scoped to where layout control is required, and each instance is documented with // SAFETY: <invariant>. To further ensure soundness, ColdString is written using Rust’s strict provenance API, handles unaligned access internally, maintains explicit heap alignment guarantees, and is validated with property testing and MIRI.

§Benchmarks

§Memory Comparisons (Allocator)

Memory usage per string, measured by tracking the memory requested by the allocator:

string_memory

§Memory Comparison (System RSS)

Resident set size in bytes per insertion of various collections. Insertions are strings with random length 0..=N:

Vec0..=40..=80..=160..=320..=64
cold-string8.08.023.233.753.4
compact_str24.024.024.034.660.6
compact_string22.924.931.639.755.7
smallstr24.024.038.050.368.4
smartstring24.024.024.040.465.4
smol_str24.024.024.039.971.2
std35.837.445.854.270.5
HashSet0..=40..=80..=160..=320..=64
cold-string18.918.934.545.564.0
compact_str52.452.452.462.288.9
compact_string23.230.039.649.165.9
smallstr52.452.466.578.696.9
smartstring52.452.452.468.294.0
smol_str52.452.452.468.399.4
std56.861.972.281.798.5
BTreeSet0..=40..=80..=160..=320..=64
cold-string10.118.949.379.1117.2
compact_str24.848.461.590.5145.7
compact_string19.743.767.088.3122.4
smallstr24.848.189.7121.9162.0
smartstring24.548.661.1102.3155.8
smol_str25.048.361.6100.7166.7
std35.870.4102.9128.9165.5

§Speed

§Construction: Variable Length (0..=N) [ns/op]
Crate0..=40..=80..=160..=320..=64
cold-string10.09.225.330.037.2
compact_str8.810.110.014.449.4
compact_string34.534.837.534.938.3
smallstr8.99.423.144.932.7
smartstring14.815.115.026.949.5
smol_str19.219.820.123.433.7
std28.631.434.932.033.1
§Construction: Fixed Length (N..=N) [ns/op]
Crate4..=48..=816..=1632..=3264..=64
cold-string6.54.234.234.336.2
compact_str7.57.57.631.032.4
compact_string29.228.929.229.932.3
smallstr4.52.628.728.529.9
smartstring14.714.88.661.663.4
smol_str15.212.815.741.742.0
std28.227.628.629.330.4

§License

Licensed under either of

  • Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

§Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Structs§

ColdString
Compact representation of immutable UTF-8 strings. Optimized for memory usage and struct packing.