Skip to main content

ChunkSizer

Trait ChunkSizer 

Source
pub trait ChunkSizer: Send + Sync {
    // Required method
    fn size(&self, text: &str) -> usize;
}
Expand description

Measures the size of a chunk for size-budget comparisons.

CodeChunker uses a ChunkSizer to decide whether a node fits within max_chunk_size and whether to merge atomic chunks. Default: byte length via ByteSizer. Plug in a tokenizer-backed sizer to size chunks in tokens — match your embedding model’s actual context limit instead of approximating with bytes.

max_chunk_size is interpreted in whatever unit the sizer returns — bytes for the default ByteSizer, tokens for a tokenizer-backed sizer.

Required Methods§

Source

fn size(&self, text: &str) -> usize

Return the size of text in whatever unit this sizer measures.

Implementors§