Expand description
Turning a repo’s git-tracked files into indexable chunks.
Chunking is deliberately language-agnostic: a file is split into bounded
windows of lines with a small overlap, each carrying a contextual prefix
(repo › relpath › Lstart-end) so both the lexical index and the embedding
model see where a chunk came from. Tree-sitter symbol-aware chunking is a
later change; this keeps v1 shippable across every language.
Structs§
- Chunk
- One indexable unit: a window of lines from a single file plus the metadata the store and renderer need.
- Tracked
File - A git-tracked file resolved to its current working-tree bytes, ready to
chunk.
content_hashis the git blob SHA oftext, used to detect changes for incremental reindexing.
Constants§
- CHUNK_
LINES - Number of lines per chunk window.
- CHUNK_
OVERLAP - Lines of overlap between consecutive chunks, so a construct that straddles a window boundary still appears whole in at least one chunk.
- MAX_
FILE_ BYTES - Maximum file size we index, in bytes.
Functions§
- chunk_
file - Split a file’s
textinto overlapping line-windowChunks. An empty or whitespace-only file yields no chunks. - tracked_
files - Enumerate the git-tracked files of
repo_patheligible for indexing.