Skip to main content

Module chunk

Module chunk 

Source
Expand description

Turning a repo’s git-tracked files into indexable chunks.

Chunking is deliberately language-agnostic: a file is split into bounded windows of lines with a small overlap, each carrying a contextual prefix (repo › relpath › Lstart-end) so both the lexical index and the embedding model see where a chunk came from. Tree-sitter symbol-aware chunking is a later change; this keeps v1 shippable across every language.

Structs§

Chunk
One indexable unit: a window of lines from a single file plus the metadata the store and renderer need.
TrackedFile
A git-tracked file resolved to its current working-tree bytes, ready to chunk. content_hash is the git blob SHA of text, used to detect changes for incremental reindexing.

Constants§

CHUNK_LINES
Number of lines per chunk window.
CHUNK_OVERLAP
Lines of overlap between consecutive chunks, so a construct that straddles a window boundary still appears whole in at least one chunk.
MAX_FILE_BYTES
Maximum file size we index, in bytes.

Functions§

chunk_file
Split a file’s text into overlapping line-window Chunks. An empty or whitespace-only file yields no chunks.
tracked_files
Enumerate the git-tracked files of repo_path eligible for indexing.