Skip to main content

Module chunker

Module chunker 

Source
Expand description

CST-aware code chunking.

Splits source files into semantically meaningful chunks using the concrete syntax tree (CST) produced by ast-grep/tree-sitter. The algorithm:

  1. If a CST node fits within max_chunk_size (non-whitespace chars) -> emit it as a chunk.
  2. If too large -> recurse into named children.
  3. Adjacent small siblings are merged greedily until the merged size would exceed max_chunk_size.

Each chunk records its parent symbol (resolved by line-range containment).

Structs§

ChunkConfig
Configuration for the chunker.
CodeChunk
A code chunk produced by the CST-aware chunker.

Functions§

chunk_file
Chunk a file using its CST tree.