ferray-strings
Vectorized string operations on arrays of strings — a Rust port of numpy.strings / numpy.char.
Part of the ferray workspace, a Rust-native, drop-in NumPy replacement.
Overview
The primary type is StringArray<D>, a specialized N-dimensional string array
backed by Vec<String> (with StringArray1 / StringArray2 aliases).
Because String does not implement ferray_core::Element, it is a separate
type from NdArray<T, D>. Every operation returns FerrayResult<T> and
threads the input dimension D through to the output.
Operations are grouped into families, re-exported under a flat namespace
mirroring numpy.strings.upper, numpy.strings.find, …:
- Construction:
array, plusStringArray::from_vec/from_rows. - Case (
case):upper,lower,title,capitalize;swapcase(instr_ops). - Alignment (
align):center,ljust,ljust_with,rjust,rjust_with,zfill. - Stripping (
strip):strip,lstrip,rstrip. - Search (
search):find,rfind,index,rindex,count,startswith,endswith,replace. - Split / join (
split_join):split,rsplit,splitlines,split_ragged,join,join_array. - Concat / repeat (
concat):add,add_same,multiply. - Regex (
regex_ops):match_,match_compiled,extract,extract_compiled(withregex::Regexre-exported). - Predicates (
classify):isalpha,isdigit,isdecimal,isnumeric,isalnum,isspace,isupper,islower,istitle. - Comparison (
str_ops):equal,not_equal,greater,greater_equal,less,less_equal,str_len. - Extras (
extras):partition,rpartition,slice,translate,expandtabs,mod_,encode,decode.
Unicode correctness: the classify predicates do not delegate to Rust's
std Unicode tables. Instead they binary-search sorted codepoint-range tables
derived live from the oracle CPython 3.13 / Unicode 15.1.0 build that backs the
verified numpy 2.4 reference — so isdigit, isnumeric, isalpha, … match
the NumPy oracle exactly rather than a newer Unicode revision (which would
classify ~9000 post-15.1.0 codepoints differently).
NumPy correspondence
| ferray-strings | numpy.strings / numpy.char |
|---|---|
upper, lower, title, capitalize, swapcase |
upper, lower, title, capitalize, swapcase |
strip, lstrip, rstrip |
strip, lstrip, rstrip |
find, rfind, index, rindex, count |
find, rfind, index, rindex, count |
startswith, endswith, replace |
startswith, endswith, replace |
split, rsplit, splitlines, join, partition, rpartition |
split, rsplit, splitlines, join, partition, rpartition |
center, ljust, rjust, zfill |
center, ljust, rjust, zfill |
add, multiply |
add, multiply |
isalpha, isdigit, isdecimal, isnumeric, isalnum, isspace, isupper, islower, istitle |
isalpha, isdigit, isdecimal, isnumeric, isalnum, isspace, isupper, islower, istitle |
equal, not_equal, greater, greater_equal, less, less_equal |
equal, not_equal, greater, greater_equal, less, less_equal |
str_len, encode, decode, expandtabs, translate, mod_, slice |
str_len, encode, decode, expandtabs, translate, mod, slice |
Feature flags
| Feature | Default | Description |
|---|---|---|
compact-storage |
off | Enables the compact module: an Arrow-style offsets+values backend (CompactStringArray, CompactStringIter, estimated_string_array_bytes) for memory-dense, cache-friendly string arrays. |
Example
use *;
// Build a 1-D StringArray and run a small pipeline.
let raw = array?;
let cleaned = strip?; // ["Hello", "World"]
let shouted = upper?; // ["HELLO", "WORLD"]
let suffix = array?;
let out = add?; // broadcasts: ["HELLO!", "WORLD!"]
assert_eq!;
// Search returns NumPy-style i64 indices (-1 when absent).
let idx = find?;
assert_eq!;
// Regex match with capture-group extraction.
let a = array?;
assert_eq!;
assert_eq!;
# Ok::
MSRV & edition
- Edition 2024, MSRV 1.88.
- Licensed under MIT OR Apache-2.0.