Expand description
The fastest semantic text chunking library — up to 1TB/s chunking throughput.
§Example
use memchunk::chunk;
let text = b"Hello world. How are you? I'm fine.\nThanks for asking.";
// With defaults (4KB chunks, split at \n . ?)
let chunks: Vec<&[u8]> = chunk(text).collect();
// With custom size and delimiters
let chunks: Vec<&[u8]> = chunk(text).size(1024).delimiters(b"\n.?!").collect();
// With multi-byte pattern (e.g., metaspace for SentencePiece tokenizers)
let metaspace = "▁".as_bytes(); // [0xE2, 0x96, 0x81]
let chunks: Vec<&[u8]> = chunk(b"Hello\xE2\x96\x81World").pattern(metaspace).collect();Structs§
- Chunker
- Chunker splits text at delimiter boundaries.
- Owned
Chunker - Owned chunker for FFI bindings (Python, WASM).
Constants§
- DEFAULT_
DELIMITERS - Default delimiters: newline, period, question mark.
- DEFAULT_
TARGET_ SIZE - Default chunk target size (4KB).
Functions§
- chunk
- Chunk text at delimiter boundaries.