StringTape
A memory-efficient collection for variable-length strings, co-located on a contiguous "tape".
- Convertible to Apache Arrow
String
/LargeString
&Binary
/LargeBinary
arrays - Compatible with UTF-8 & binary strings in Rust via
CharsTape
andBytesTape
- Usable in
no_std
and with custom allocators for GPU & embedded use cases - Sliceable into zero-copy borrow-checked views with
[i..n]
range syntax
Quick Start
use ;
// Create a new CharsTape with 32-bit offsets
let mut tape = new;
tape.push?;
tape.push?;
assert_eq!;
assert_eq!;
assert_eq!;
// Iterate over strings
for s in &tape
// Build from iterator
let tape2: CharsTapeI32 = .into_iter.collect;
assert_eq!;
// Binary data with BytesTape
let mut bytes = new;
bytes.push?;
assert_eq!;
# Ok::
Memory Layout
CharsTape
and BytesTape
use the same memory layout as Apache Arrow string and binary arrays:
Data buffer: [h,e,l,l,o,w,o,r,l,d]
Offset buffer: [0, 5, 10]
API Overview
Basic Operations
use CharsTapeI32;
let mut tape = new;
tape.push?; // Append one string
tape.extend?; // Append an array
assert_eq!; // Direct indexing
assert_eq!; // Safe access
for s in &tape
// Construct from an iterator
let tape2: CharsTapeI32 = .into_iter.collect;
BytesTape
provides the same interface for arbitrary byte slices.
Views and Slicing
let view = tape.view; // View entire tape
let subview = tape.subview?; // Items [1, 3)
let nested = subview.subview?; // Nested subviews
let raw_bytes = &tape.view; // Raw byte slice
// Views have same API as tapes
assert_eq!;
assert_eq!;
Memory Management
// Pre-allocate capacity
let tape = with_capacity?; // 1KB data, 100 strings
// Monitor usage
println!;
// Modify
tape.clear; // Remove all items
tape.truncate; // Keep first 5 items
// Custom allocators
use Global;
let tape = new_in;
Apache Arrow Interop
True zero-copy conversion to/from Arrow arrays:
// CharsTape → Arrow (zero-copy)
let = tape.arrow_slices;
let data_buffer = from_slice_ref;
let offsets_buffer = new;
let arrow_array = new;
// Arrow → CharsTapeView (zero-copy)
let view = unsafe ;
BytesTape
works the same way with Arrow BinaryArray
/LargeBinaryArray
types.
Unsigned Offsets
In addition to the signed offsets (i32
/i64
via CharsTapeI32
/CharsTapeI64
),
the library also supports unsigned offsets (u32
/u64
) when you prefer non-negative indexing:
CharsTapeU32
,CharsTapeU64
BytesTapeU32
,BytesTapeU64
CharsTapeViewU32<'_>
,CharsTapeViewU64<'_>
BytesTapeViewU32<'_>
,BytesTapeViewU64<'_>
Note, that unsigned offsets cannot be converted to/from Arrow arrays.
no_std
Support
StringTape can be used in no_std
environments:
[]
= { = "2", = false }
In no_std
mode:
- All functionality is preserved
- Requires
alloc
for dynamic allocation - Error types implement
Display
but notstd::error::Error
Testing
Run tests for both std
and no_std
configurations: