Expand description
§code-chunk
Split source code into RAG-friendly chunks that respect function and class boundaries — without parsing the language.
Strategy: split on top-level “block boundaries”:
- A brace-language line ending in
{opens a block; the matching}at the same indent closes it. - A Python/YAML-style top-level line followed by an indented block forms an indent block.
When a function/class is larger than max_chars, it is split
conservatively on blank-line boundaries inside the block so context
is preserved.
Best for second-stage chunking after snipsplit-style sentence
chunking has handled prose.
§Example
use code_chunk::chunk;
let src = "fn a() {\n 1\n}\nfn b() {\n 2\n}\n";
// Small cap forces a split at the function boundary.
let chunks = chunk(src, 10);
assert_eq!(chunks.len(), 2);
assert!(chunks[0].contains("fn a()"));
assert!(chunks[1].contains("fn b()"));Functions§
- chunk
- Split
srcinto chunks no larger thanmax_chars.