# StringTape

Memory-efficient collection classes for variable-length strings, co-located on contiguous "tapes".
- Convertible to __[Apache Arrow](https://arrow.apache.org/)__ `String`/`LargeString` & `Binary`/`LargeBinary` arrays
- Compatible with __UTF-8 & binary__ strings in Rust via `CharsTape` and `BytesTape`
- Usable in `no_std` and with custom __allocators__ for GPU & embedded use cases
- Sliceable into __zero-copy__ borrow-checked views with `[i..n]` range syntax
__Why not use `Vec<String>`?__
The first reason is about __RAM__!
In addition to storing 3 pointers for every `String` instance, your underlying memory allocator may store 2-4 pointers per heap entry.
Assuming most English words are under 8 bytes long, storing an extra 6 pointers on a 64-bit machine results in a 7x memory usage amplification:
```rust
let doc = fs::read_to_string("enwik9.txt")?; // 1.0 GB
let words = doc.split_whitespace(); // ~ 160 M words
let buffers = words.map(str::as_bytes);
let _ = Vec::<String>::from_iter(words); // + 7.1 GB copied ❌
let _ = CharsTapeAuto::from_iter(words); // + 1.3 GB copied ✅
let _ = Vec::<&[u8]>::from_iter(buffers); // + 1.9 GB copy-less ⚠️
let _ = BytesCowsAuto::from_iter_and_data( // + 0.7 GB copy-less ✅
buffers,
Cow::Borrowed(doc.as_bytes()),
);
```
> "Tape" classes copy data into contiguous buffers for cache-friendly iteration.
> "Cows" classes reference existing data without copies.
The second reason is about __NUMA & GPUs__!
When designing high-performance systems, memory affinity is crucial.
On 2-socket servers, developers should avoid accessing memory allocated on the other socket.
On GPU-equipped servers, data should be transferred to GPU memory in large chunks to hide PCIe and software latency.
With individual `String` instances scattered around the heap, this is impossible.
With tapes and cows, this is easy, just don't forget to use custom allocators, like:
- the NUMA-aware `PinnedAllocator` from [Fork Union](https://github.com/ashvardanian/fork_union/),
- the GPU-friendly `UnifiedAlloc` from [StringZilla](https://github.com/ashvardanian/StringZilla),
- or any other allocator implementing the [allocator_api2](https://crates.io/crates/allocator_api2) traits.
## Quick Start
```rust
use stringtape::{CharsTapeI32, BytesTapeI32, StringTapeError};
// Create a new CharsTape with 32-bit offsets
let mut tape = CharsTapeI32::new();
tape.push("hello")?;
tape.push("world")?;
assert_eq!(tape.len(), 2);
assert_eq!(&tape[0], "hello");
assert_eq!(tape.get(1), Some("world"));
// Iterate over strings
for s in &tape {
println!("{}", s);
}
// Build from iterator with auto type selection
let tape2 = CharsTapeAuto::from_iter(["a", "b", "c"].iter().copied());
assert_eq!(tape2.len(), 3);
// Zero-copy slices referencing existing data
let data = "hello world";
let cows = CharsCowsAuto::from_iter_and_data(
data.split_whitespace(),
Cow::Borrowed(data.as_bytes()),
)?;
# Ok::<(), StringTapeError>(())
```
## Memory Layout
`CharsTape` and `BytesTape` use the same memory layout as Apache Arrow string and binary arrays:
```text
Data buffer: [h,e,l,l,o,w,o,r,l,d]
Offset buffer: [0, 5, 10]
```
`CharsCows` and `BytesCows` use packed `(offset, length)` entries to reference substrings in shared data:
```text
Shared data: [h,e,l,l,o, ,w,o,r,l,d, ,f,o,o]
Entries: [(0, 5), (6, 5), (12, 3)] // "hello", "world", "foo"
```
The `#[repr(C, packed(1))]` layout eliminates padding between offset and length fields.
Similar to [SoA vs AoS](https://en.wikipedia.org/wiki/AoS_and_SoA), there is a trade-off.
That said, in extremely high-throughput applications, like hashing billions of very short strings, the runtime may be dominated by unaligned memory accesses and the additional level of indirection.
Consider switching to `CharsTape` or `BytesTape` in such cases.
## API Overview
### Basic Operations
```rust
use stringtape::CharsTapeI32;
let mut tape = CharsTapeI32::new();
tape.push("hello")?; // Append one string
tape.extend(["world", "foo"])?; // Append an array
assert_eq!(&tape[0], "hello"); // Direct indexing
assert_eq!(tape.get(1), Some("world")); // Safe access
assert!(tape.contains("hello")); // Membership test
for s in &tape { // Forward iteration
println!("{}", s);
}
for s in tape.iter().rev() { // Reverse iteration
println!("{}", s);
}
// Construct from iterator
let tape2: CharsTapeI32 = ["a", "b", "c"].into_iter().collect();
// Sort & compare in-place
tape.sort();
tape.sort_by(|a, b| a.len().cmp(&b.len()));
assert!(tape < tape2); // Lexicographic comparison
// Utility methods
tape.first(); // First element
tape.last(); // Last element
tape.pop(); // Remove last
tape.shrink_to_fit()?; // Reduce memory usage
```
`BytesTape` and `CharsCowsAuto`/`BytesCowsAuto` provide the same interface.
### Views and Slicing
```rust
let view = tape.view(); // View entire tape
let subview = tape.subview(1, 3)?; // Items [1, 3)
let nested = subview.subview(0, 1)?; // Nested subviews
let raw_bytes = &tape.view()[1..3]; // Raw byte slice
// Views have same API as tapes
assert_eq!(subview.len(), 2);
assert_eq!(&subview[0], "world");
```
### Memory Management
```rust
// Pre-allocate capacity
let tape = CharsTapeI32::with_capacity(1024, 100)?; // 1KB data, 100 strings
// Monitor usage
println!("Items: {}, Data: {} bytes", tape.len(), tape.data_len());
// Modify
tape.clear(); // Remove all items
tape.truncate(5); // Keep first 5 items
// Custom allocators
use allocator_api2::alloc::Global;
let tape = CharsTape::new_in(Global);
```
### Apache Arrow Interop
True zero-copy conversion to/from Arrow arrays:
```rust
// CharsTape → Arrow (zero-copy)
let (data_slice, offsets_slice) = tape.arrow_slices();
let data_buffer = Buffer::from_slice_ref(data_slice);
let offsets_buffer = OffsetBuffer::new(ScalarBuffer::new(
Buffer::from_slice_ref(offsets_slice), 0, offsets_slice.len()
));
let arrow_array = StringArray::new(offsets_buffer, data_buffer, None);
// Arrow → CharsTapeView (zero-copy)
let view = unsafe {
CharsTapeViewI32::from_raw_parts(
arrow_array.values(),
arrow_array.offsets().as_ref(),
)
};
```
`BytesTape` works the same way with Arrow `BinaryArray`/`LargeBinaryArray` types.
### Auto Type Selection
Auto variants automatically select the most memory-efficient types based on data size:
```rust
// CharsTapeAuto: selects I32/U32/U64 offset based on total data size
let tape = CharsTapeAuto::from_iter(strings);
// CharsCowsAuto: selects offset (U32/U64) and length (U8/U16/U32) types
let cows = CharsCowsAuto::from_iter_and_data(strings, data)?;
```
Available: `CharsTapeAuto`, `BytesTapeAuto`, `CharsCowsAuto`, `BytesCowsAuto`.
### Unsigned Offsets
Unsigned offsets (`u32`/`u64`) are available via `CharsTapeU32`, `CharsTapeU64`, `BytesTapeU16`, `BytesTapeU32`, `BytesTapeU64` and corresponding views.
These cannot be converted to/from Arrow arrays.
### Standard Traits
All tape and view types implement standard Rust traits: `Debug`, `Clone`, `PartialEq`, `Eq`, `PartialOrd`, `Ord`, `Hash`, `Default`, `FromIterator`, `Extend`, `IntoIterator`, `Iterator`, `ExactSizeIterator`, `DoubleEndedIterator`, `Send`, `Sync`.
```rust
use std::collections::{HashMap, HashSet};
let mut tape1 = CharsTapeI32::new();
tape1.push("hello")?;
let mut tape2 = tape1.clone(); // Clone
assert_eq!(tape1, tape2); // PartialEq, Eq
assert!(tape1 <= tape2); // PartialOrd, Ord
// Use in HashMap/HashSet
let mut map = HashMap::new();
map.insert(tape1, 42); // Hash
// Reverse iteration
for s in tape2.iter().rev() { // DoubleEndedIterator
println!("{}", s);
}
```
## `no_std` Support
StringTape can be used in `no_std` environments:
```toml
[dependencies]
stringtape = { version = "2", default-features = false }
```
In `no_std` mode:
- All functionality is preserved
- Requires `alloc` for dynamic allocation
- Error types implement `Display` but not `std::error::Error`
## Testing
Run tests for both `std` and `no_std` configurations:
```bash
cargo test # Test with std (default)
cargo test --doc # Test documentation examples
cargo test --no-default-features # Test without std
cargo test --all-features # Test with all features enabled
cargo clippy --lib -- -D warnings # Lint the library code
cargo fmt --all -- --check # Check code formatting
```
To reproduce memory usage numbers mentioned above, pull the dataset for the ["Large Text Compression Benchmark"](https://mattmahoney.net/dc/textdata.html), and run:
```bash
/usr/bin/time -f "CharsCowsAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_slices -- enwik9.txt
/usr/bin/time -f "CharsTapeAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_tape -- enwik9.txt
```