Expand description
Token-aware chunking utilities for bodies that exceed the embedding window. Semantic chunking for embedding inputs (Markdown-aware, 512-token limit).
Splits bodies using text_splitter::MarkdownSplitter with overlap so
multi-chunk memories preserve context across chunk boundaries.
Structs§
Constants§
Functions§
- aggregate_
embeddings - chunk_
text - needs_
chunking - split_
into_ chunks - split_
into_ chunks_ by_ token_ offsets - split_
into_ chunks_ hierarchical - Splits body into chunks using MarkdownSplitter with a real tokenizer. Respects Markdown semantic boundaries (H1-H6, paragraphs, blocks). For plain text without Markdown markers, falls back to paragraph and sentence breaks.