Skip to main content

Crate markdown_chunk

Crate markdown_chunk 

Source
Expand description

§markdown-chunk

Heading-aware Markdown chunker for RAG ingestion.

Rules:

  • A new chunk starts at every ATX heading (# , ## …).
  • Fenced code blocks (`````) are never split mid-block.
  • Headers that produce empty bodies are concatenated with the next.
  • Chunks are soft-capped at max_chars; oversize sections are returned whole (a single 30k-char chapter is one chunk).

Each chunk carries its inherited heading trail so retrieval results show where the snippet came from.

§Example

use markdown_chunk::chunk;
let md = "# Title\n\n## Section A\nbody A\n## Section B\nbody B\n";
// Cap below total size forces a split at the next heading.
let chunks = chunk(md, 20);
assert!(chunks.len() >= 2);

Functions§

chunk
Split md into chunks at heading boundaries.