Skip to main content

chunk_document

Function chunk_document 

Source
pub fn chunk_document(
    doc: &Document,
    config: &ChunkConfig,
) -> Vec<DocumentChunk>
Expand description

Split a document into token-budgeted chunks.

The algorithm:

  1. Flatten the document’s section tree into candidates
  2. Greedily pack candidates into chunks without exceeding max_tokens
  3. Tables and lists are never split across chunks
  4. Breaks prefer section boundaries (headings)
  5. For overlap, the last section title from the previous chunk is included as context