Structs§
- Chunking
Config - Chunking configuration
- Text
Chunk - A text chunk with metadata
- Tokenizer
- Tokenizer wrapper for counting tokens
Functions§
- chunk_
text - Chunk text into pieces with overlap
- chunk_
text_ semantic - Chunk text using semantic boundaries (paragraphs, sentences)
- estimate_
token_ count - Estimate token count without full tokenization (faster but less accurate)
- merge_
small_ chunks - Merge small chunks to reduce overhead
- truncate_
to_ tokens - Truncate text to fit within token budget