Skip to main content

Crate code_chunk

Crate code_chunk 

Source
Expand description

§code-chunk

Split source code into RAG-friendly chunks that respect function and class boundaries — without parsing the language.

Strategy: split on top-level “block boundaries”:

  • A brace-language line ending in { opens a block; the matching } at the same indent closes it.
  • A Python/YAML-style top-level line followed by an indented block forms an indent block.

When a function/class is larger than max_chars, it is split conservatively on blank-line boundaries inside the block so context is preserved.

Best for second-stage chunking after snipsplit-style sentence chunking has handled prose.

§Example

use code_chunk::chunk;
let src = "fn a() {\n    1\n}\nfn b() {\n    2\n}\n";
// Small cap forces a split at the function boundary.
let chunks = chunk(src, 10);
assert_eq!(chunks.len(), 2);
assert!(chunks[0].contains("fn a()"));
assert!(chunks[1].contains("fn b()"));

Functions§

chunk
Split src into chunks no larger than max_chars.