Skip to main content

split_into_chunks_by_token_offsets

Function split_into_chunks_by_token_offsets 

Source
pub fn split_into_chunks_by_token_offsets(
    body: &str,
    token_offsets: &[(usize, usize)],
) -> Vec<Chunk>
Expand description

Splits body into Chunks using pre-computed token byte-offsets.

Each element of token_offsets is a (start, end) byte range for one token. Respects CHUNK_SIZE_TOKENS and CHUNK_OVERLAP_TOKENS constants. Short bodies (≤ CHUNK_SIZE_TOKENS tokens) are returned as a single chunk.