StringTape

StringTape banner

Memory-efficient collection classes for variable-length strings, co-located on contiguous "tapes".

Convertible to Apache Arrow String/LargeString & Binary/LargeBinary arrays
Compatible with UTF-8 & binary strings in Rust via CharsTape and BytesTape
Usable in no_std and with custom allocators for GPU & embedded use cases
Sliceable into zero-copy borrow-checked views with [i..n] range syntax

Why not use Vec<String>? The first reason is about RAM! In addition to storing 3 pointers for every String instance, your underlying memory allocator may store 2-4 pointers per heap entry. Assuming most English words are under 8 bytes long, storing an extra 6 pointers on a 64-bit machine results in a 7x memory usage amplification:

let doc = fs::read_to_string("enwik9.txt")?;    // 1.0 GB
let words = doc.split_whitespace();             // ~ 160 M words
let buffers = words.map(str::as_bytes);

let _ = Vec::<String>::from_iter(words);        // + 7.1 GB copied ❌
let _ = CharsTapeAuto::from_iter(words);        // + 1.3 GB copied ✅
let _ = Vec::<&[u8]>::from_iter(buffers);       // + 1.9 GB copy-less ⚠️
let _ = BytesCowsAuto::from_iter_and_data(      // + 0.7 GB copy-less ✅
    buffers,
    Cow::Borrowed(doc.as_bytes()),
);

"Tape" classes copy data into contiguous buffers for cache-friendly iteration. "Cows" classes reference existing data without copies.

The second reason is about NUMA & GPUs! When designing high-performance systems, memory affinity is crucial. On 2-socket servers, developers should avoid accessing memory allocated on the other socket. On GPU-equipped servers, data should be transferred to GPU memory in large chunks to hide PCIe and software latency. With individual String instances scattered around the heap, this is impossible. With tapes and cows, this is easy, just don't forget to use custom allocators, like:

the NUMA-aware PinnedAllocator from Fork Union,
the GPU-friendly UnifiedAlloc from StringZilla,
or any other allocator implementing the allocator_api2 traits.

Quick Start

use stringtape::{CharsTapeI32, BytesTapeI32, StringTapeError};

// Create a new CharsTape with 32-bit offsets
let mut tape = CharsTapeI32::new();
tape.push("hello")?;
tape.push("world")?;

assert_eq!(tape.len(), 2);
assert_eq!(&tape[0], "hello");
assert_eq!(tape.get(1), Some("world"));

// Iterate over strings
for s in &tape {
    println!("{}", s);
}

// Build from iterator with auto type selection
let tape2 = CharsTapeAuto::from_iter(["a", "b", "c"].iter().copied());
assert_eq!(tape2.len(), 3);

// Zero-copy slices referencing existing data
let data = "hello world";
let cows = CharsCowsAuto::from_iter_and_data(
    data.split_whitespace(),
    Cow::Borrowed(data.as_bytes()),
)?;

# Ok::<(), StringTapeError>(())

Memory Layout

CharsTape and BytesTape use the same memory layout as Apache Arrow string and binary arrays:

Data buffer:    [h,e,l,l,o,w,o,r,l,d]
Offset buffer:  [0, 5, 10]

CharsCows and BytesCows use packed (offset, length) entries to reference substrings in shared data:

Shared data:    [h,e,l,l,o, ,w,o,r,l,d, ,f,o,o]
Entries:        [(0, 5), (6, 5), (12, 3)]  // "hello", "world", "foo"

The #[repr(C, packed(1))] layout eliminates padding between offset and length fields. Similar to SoA vs AoS, there is a trade-off. That said, in extremely high-throughput applications, like hashing billions of very short strings, the runtime may be dominated by unaligned memory accesses and the additional level of indirection. Consider switching to CharsTape or BytesTape in such cases.

API Overview

Basic Operations

use stringtape::CharsTapeI32;

let mut tape = CharsTapeI32::new();
tape.push("hello")?;                    // Append one string
tape.extend(["world", "foo"])?;         // Append an array
assert_eq!(&tape[0], "hello");          // Direct indexing
assert_eq!(tape.get(1), Some("world")); // Safe access
assert!(tape.contains("hello"));        // Membership test

for s in &tape {                        // Forward iteration
    println!("{}", s);
}
for s in tape.iter().rev() {            // Reverse iteration
    println!("{}", s);
}

// Construct from iterator
let tape2: CharsTapeI32 = ["a", "b", "c"].into_iter().collect();

// Sort & compare in-place
tape.sort();
tape.sort_by(|a, b| a.len().cmp(&b.len()));
assert!(tape < tape2);                  // Lexicographic comparison

// Utility methods
tape.first();                           // First element
tape.last();                            // Last element
tape.pop();                             // Remove last
tape.shrink_to_fit()?;                  // Reduce memory usage

BytesTape and CharsCowsAuto/BytesCowsAuto provide the same interface.

Views and Slicing

let view = tape.view();              // View entire tape
let subview = tape.subview(1, 3)?;   // Items [1, 3)
let nested = subview.subview(0, 1)?; // Nested subviews
let raw_bytes = &tape.view()[1..3];  // Raw byte slice

// Views have same API as tapes
assert_eq!(subview.len(), 2);
assert_eq!(&subview[0], "world");

Memory Management

// Pre-allocate capacity
let tape = CharsTapeI32::with_capacity(1024, 100)?; // 1KB data, 100 strings

// Monitor usage
println!("Items: {}, Data: {} bytes", tape.len(), tape.data_len());

// Modify
tape.clear();           // Remove all items
tape.truncate(5);       // Keep first 5 items

// Custom allocators
use allocator_api2::alloc::Global;
let tape = CharsTape::new_in(Global);

Apache Arrow Interop

True zero-copy conversion to/from Arrow arrays:

// CharsTape → Arrow (zero-copy)
let (data_slice, offsets_slice) = tape.arrow_slices();
let data_buffer = Buffer::from_slice_ref(data_slice);
let offsets_buffer = OffsetBuffer::new(ScalarBuffer::new(
    Buffer::from_slice_ref(offsets_slice), 0, offsets_slice.len()
));
let arrow_array = StringArray::new(offsets_buffer, data_buffer, None);

// Arrow → CharsTapeView (zero-copy)
let view = unsafe {
    CharsTapeViewI32::from_raw_parts(
        arrow_array.values(),
        arrow_array.offsets().as_ref(),
    )
};

BytesTape works the same way with Arrow BinaryArray/LargeBinaryArray types.

Auto Type Selection

Auto variants automatically select the most memory-efficient types based on data size:

// CharsTapeAuto: selects I32/U32/U64 offset based on total data size
let tape = CharsTapeAuto::from_iter(strings);

// CharsCowsAuto: selects offset (U32/U64) and length (U8/U16/U32) types
let cows = CharsCowsAuto::from_iter_and_data(strings, data)?;

Available: CharsTapeAuto, BytesTapeAuto, CharsCowsAuto, BytesCowsAuto.

Unsigned Offsets

Unsigned offsets (u32/u64) are available via CharsTapeU32, CharsTapeU64, BytesTapeU16, BytesTapeU32, BytesTapeU64 and corresponding views. These cannot be converted to/from Arrow arrays.

Standard Traits

All tape and view types implement standard Rust traits: Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, Default, FromIterator, Extend, IntoIterator, Iterator, ExactSizeIterator, DoubleEndedIterator, Send, Sync.

use std::collections::{HashMap, HashSet};

let mut tape1 = CharsTapeI32::new();
tape1.push("hello")?;

let mut tape2 = tape1.clone();          // Clone
assert_eq!(tape1, tape2);               // PartialEq, Eq
assert!(tape1 <= tape2);                // PartialOrd, Ord

// Use in HashMap/HashSet
let mut map = HashMap::new();
map.insert(tape1, 42);                  // Hash

// Reverse iteration
for s in tape2.iter().rev() {           // DoubleEndedIterator
    println!("{}", s);
}

`no_std` Support

StringTape can be used in no_std environments:

[dependencies]
stringtape = { version = "2", default-features = false }

In no_std mode:

All functionality is preserved
Requires alloc for dynamic allocation
Error types implement Display but not std::error::Error

Testing

Run tests for both std and no_std configurations:

cargo test                          # Test with std (default)
cargo test --doc                    # Test documentation examples
cargo test --no-default-features    # Test without std
cargo test --all-features           # Test with all features enabled
cargo clippy --lib -- -D warnings   # Lint the library code
cargo fmt --all -- --check          # Check code formatting

To reproduce memory usage numbers mentioned above, pull the dataset for the "Large Text Compression Benchmark", and run:

/usr/bin/time -f "Vec<String>: %M KB | %E" cargo run --release --quiet --bin bench_vec_string -- enwik9.txt
/usr/bin/time -f "Vec<&[u8]>: %M KB | %E" cargo run --release --quiet --bin bench_vec_slice -- enwik9.txt
/usr/bin/time -f "CharsCowsAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_slices -- enwik9.txt
/usr/bin/time -f "CharsTapeAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_tape -- enwik9.txt

stringtape 2.4.0