stringtape 2.4.1

# StringTape

![StringTape banner](https://github.com/ashvardanian/ashvardanian/blob/master/repositories/StringTape.png?raw=true)

Memory-efficient collection classes for variable-length strings, co-located on contiguous "tapes".

- Convertible to __[Apache Arrow](https://arrow.apache.org/)__ `String`/`LargeString` & `Binary`/`LargeBinary` arrays
- Compatible with __UTF-8 & binary__ strings in Rust via `CharsTape` and `BytesTape`
- Usable in `no_std` and with custom __allocators__ for GPU & embedded use cases
- Sliceable into __zero-copy__ borrow-checked views with `[i..n]` range syntax

__Why not use `Vec<String>`?__
The first reason is about __RAM__!
In addition to storing 3 pointers for every `String` instance, your underlying memory allocator may store 2-4 pointers per heap entry.
Assuming most English words are under 8 bytes long, storing an extra 6 pointers on a 64-bit machine results in a 7x memory usage amplification:

```rust
let doc = fs::read_to_string("enwik9.txt")?;    // 1.0 GB
let words = doc.split_whitespace();             // ~ 160 M words
let buffers = words.map(str::as_bytes);

let _ = Vec::<String>::from_iter(words);        // + 7.1 GB copied ❌
let _ = CharsTapeAuto::from_iter(words);        // + 1.3 GB copied ✅
let _ = Vec::<&[u8]>::from_iter(buffers);       // + 1.9 GB copy-less ⚠️
let _ = BytesCowsAuto::from_iter_and_data(      // + 0.7 GB copy-less ✅
    buffers,
    Cow::Borrowed(doc.as_bytes()),
);
```

> "Tape" classes copy data into contiguous buffers for cache-friendly iteration.
> "Cows" classes reference existing data without copies.

The second reason is about __NUMA & GPUs__!
When designing high-performance systems, memory affinity is crucial.
On 2-socket servers, developers should avoid accessing memory allocated on the other socket.
On GPU-equipped servers, data should be transferred to GPU memory in large chunks to hide PCIe and software latency.
With individual `String` instances scattered around the heap, this is impossible.
With tapes and cows, this is easy, just don't forget to use custom allocators, like:

- the NUMA-aware `PinnedAllocator` from [Fork Union](https://github.com/ashvardanian/fork_union/),
- the GPU-friendly `UnifiedAlloc` from [StringZilla](https://github.com/ashvardanian/StringZilla),
- or any other allocator implementing the [allocator_api2](https://crates.io/crates/allocator_api2) traits.

## Quick Start

```rust
use stringtape::{CharsTapeI32, BytesTapeI32, StringTapeError};

// Create a new CharsTape with 32-bit offsets
let mut tape = CharsTapeI32::new();
tape.push("hello")?;
tape.push("world")?;

assert_eq!(tape.len(), 2);
assert_eq!(&tape[0], "hello");
assert_eq!(tape.get(1), Some("world"));

// Iterate over strings
for s in &tape {
    println!("{}", s);
}

// Build from iterator with auto type selection
let tape2 = CharsTapeAuto::from_iter(["a", "b", "c"].iter().copied());
assert_eq!(tape2.len(), 3);

// Zero-copy slices referencing existing data
let data = "hello world";
let cows = CharsCowsAuto::from_iter_and_data(
    data.split_whitespace(),
    Cow::Borrowed(data.as_bytes()),
)?;

# Ok::<(), StringTapeError>(())
```

## Memory Layout

`CharsTape` and `BytesTape` use the same memory layout as Apache Arrow string and binary arrays:

```text
Data buffer:    [h,e,l,l,o,w,o,r,l,d]
Offset buffer:  [0, 5, 10]
```

`CharsCows` and `BytesCows` use packed `(offset, length)` entries to reference substrings in shared data:

```text
Shared data:    [h,e,l,l,o, ,w,o,r,l,d, ,f,o,o]
Entries:        [(0, 5), (6, 5), (12, 3)]  // "hello", "world", "foo"
```

The `#[repr(C, packed(1))]` layout eliminates padding between offset and length fields.
Similar to [SoA vs AoS](https://en.wikipedia.org/wiki/AoS_and_SoA), there is a trade-off.
That said, in extremely high-throughput applications, like hashing billions of very short strings, the runtime may be dominated by unaligned memory accesses and the additional level of indirection.
Consider switching to `CharsTape` or `BytesTape` in such cases.

## API Overview

### Basic Operations

```rust
use stringtape::CharsTapeI32;

let mut tape = CharsTapeI32::new();
tape.push("hello")?;                    // Append one string
tape.extend(["world", "foo"])?;         // Append an array
assert_eq!(&tape[0], "hello");          // Direct indexing
assert_eq!(tape.get(1), Some("world")); // Safe access
assert!(tape.contains("hello"));        // Membership test

for s in &tape {                        // Forward iteration
    println!("{}", s);
}
for s in tape.iter().rev() {            // Reverse iteration
    println!("{}", s);
}

// Construct from iterator
let tape2: CharsTapeI32 = ["a", "b", "c"].into_iter().collect();

// Sort & compare in-place
tape.sort();
tape.sort_by(|a, b| a.len().cmp(&b.len()));
assert!(tape < tape2);                  // Lexicographic comparison

// Utility methods
tape.first();                           // First element
tape.last();                            // Last element
tape.pop();                             // Remove last
tape.shrink_to_fit()?;                  // Reduce memory usage
```

`BytesTape` and `CharsCowsAuto`/`BytesCowsAuto` provide the same interface.

### Views and Slicing

```rust
let view = tape.view();              // View entire tape
let subview = tape.subview(1, 3)?;   // Items [1, 3)
let nested = subview.subview(0, 1)?; // Nested subviews
let raw_bytes = &tape.view()[1..3];  // Raw byte slice

// Views have same API as tapes
assert_eq!(subview.len(), 2);
assert_eq!(&subview[0], "world");
```

### Memory Management

```rust
// Pre-allocate capacity
let tape = CharsTapeI32::with_capacity(1024, 100)?; // 1KB data, 100 strings

// Monitor usage
println!("Items: {}, Data: {} bytes", tape.len(), tape.data_len());

// Modify
tape.clear();           // Remove all items
tape.truncate(5);       // Keep first 5 items

// Custom allocators
use allocator_api2::alloc::Global;
let tape = CharsTape::new_in(Global);
```

### Apache Arrow Interop

True zero-copy conversion to/from Arrow arrays:

```rust
// CharsTape → Arrow (zero-copy)
let (data_slice, offsets_slice) = tape.arrow_slices();
let data_buffer = Buffer::from_slice_ref(data_slice);
let offsets_buffer = OffsetBuffer::new(ScalarBuffer::new(
    Buffer::from_slice_ref(offsets_slice), 0, offsets_slice.len()
));
let arrow_array = StringArray::new(offsets_buffer, data_buffer, None);

// Arrow → CharsTapeView (zero-copy)
let view = unsafe {
    CharsTapeViewI32::from_raw_parts(
        arrow_array.values(),
        arrow_array.offsets().as_ref(),
    )
};
```

`BytesTape` works the same way with Arrow `BinaryArray`/`LargeBinaryArray` types.

### Auto Type Selection

Auto variants automatically select the most memory-efficient types based on data size:

```rust
// CharsTapeAuto: selects I32/U32/U64 offset based on total data size
let tape = CharsTapeAuto::from_iter(strings);

// CharsCowsAuto: selects offset (U32/U64) and length (U8/U16/U32) types
let cows = CharsCowsAuto::from_iter_and_data(strings, data)?;
```

Available: `CharsTapeAuto`, `BytesTapeAuto`, `CharsCowsAuto`, `BytesCowsAuto`.

### Unsigned Offsets

Unsigned offsets (`u32`/`u64`) are available via `CharsTapeU32`, `CharsTapeU64`, `BytesTapeU16`, `BytesTapeU32`, `BytesTapeU64` and corresponding views.
These cannot be converted to/from Arrow arrays.

### Standard Traits

All tape and view types implement standard Rust traits: `Debug`, `Clone`, `PartialEq`, `Eq`, `PartialOrd`, `Ord`, `Hash`, `Default`, `FromIterator`, `Extend`, `IntoIterator`, `Iterator`, `ExactSizeIterator`, `DoubleEndedIterator`, `Send`, `Sync`.

```rust
use std::collections::{HashMap, HashSet};

let mut tape1 = CharsTapeI32::new();
tape1.push("hello")?;

let mut tape2 = tape1.clone();          // Clone
assert_eq!(tape1, tape2);               // PartialEq, Eq
assert!(tape1 <= tape2);                // PartialOrd, Ord

// Use in HashMap/HashSet
let mut map = HashMap::new();
map.insert(tape1, 42);                  // Hash

// Reverse iteration
for s in tape2.iter().rev() {           // DoubleEndedIterator
    println!("{}", s);
}
```

## `no_std` Support

StringTape can be used in `no_std` environments:

```toml
[dependencies]
stringtape = { version = "2", default-features = false }
```

In `no_std` mode:

- All functionality is preserved
- Requires `alloc` for dynamic allocation
- Error types implement `Display` but not `std::error::Error`

## Testing

Run tests for both `std` and `no_std` configurations:

```bash
cargo test                          # Test with std (default)
cargo test --doc                    # Test documentation examples
cargo test --no-default-features    # Test without std
cargo test --all-features           # Test with all features enabled
cargo clippy --lib -- -D warnings   # Lint the library code
cargo fmt --all -- --check          # Check code formatting
```

To reproduce memory usage numbers mentioned above, pull the dataset for the ["Large Text Compression Benchmark"](https://mattmahoney.net/dc/textdata.html), and run:

```bash
/usr/bin/time -f "Vec<String>: %M KB | %E" cargo run --release --quiet --bin bench_vec_string -- enwik9.txt
/usr/bin/time -f "Vec<&[u8]>: %M KB | %E" cargo run --release --quiet --bin bench_vec_slice -- enwik9.txt
/usr/bin/time -f "CharsCowsAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_slices -- enwik9.txt
/usr/bin/time -f "CharsTapeAuto: %M KB | %E" cargo run --release --quiet --bin bench_chars_tape -- enwik9.txt
```