Crate memchunk

Crate memchunk 

Source
Expand description

The fastest semantic text chunking library — up to 1TB/s chunking throughput.

§Example

use memchunk::chunk;

let text = b"Hello world. How are you? I'm fine.\nThanks for asking.";

// With defaults (4KB chunks, split at \n . ?)
let chunks: Vec<&[u8]> = chunk(text).collect();

// With custom size and delimiters
let chunks: Vec<&[u8]> = chunk(text).size(1024).delimiters(b"\n.?!").collect();

// With multi-byte pattern (e.g., metaspace for SentencePiece tokenizers)
let metaspace = "▁".as_bytes(); // [0xE2, 0x96, 0x81]
let chunks: Vec<&[u8]> = chunk(b"Hello\xE2\x96\x81World").pattern(metaspace).collect();

Structs§

Chunker
Chunker splits text at delimiter boundaries.
OwnedChunker
Owned chunker for FFI bindings (Python, WASM).

Constants§

DEFAULT_DELIMITERS
Default delimiters: newline, period, question mark.
DEFAULT_TARGET_SIZE
Default chunk target size (4KB).

Functions§

chunk
Chunk text at delimiter boundaries.