Skip to main content

Module chunking

Module chunking 

Source

Structs§

ChunkingConfig
Chunking configuration
TextChunk
A text chunk with metadata
Tokenizer
Tokenizer wrapper for counting tokens

Functions§

chunk_text
Chunk text into pieces with overlap
chunk_text_semantic
Chunk text using semantic boundaries (paragraphs, sentences)
estimate_token_count
Estimate token count without full tokenization (faster but less accurate)
merge_small_chunks
Merge small chunks to reduce overhead
truncate_to_tokens
Truncate text to fit within token budget