Skip to main content

Module chunker

Module chunker 

Source
Expand description

Content chunking: 100-line chunks with 10-line overlap. Ports logic from src/gobby/code_index/chunker.py.

This remains gcode-owned because BM25 content indexing stores line-based ContentChunk records with project, path, line range, language, and timestamp fields. The generic gobby_core::indexing::Chunk and ChunkIdentity primitives model byte ranges with opaque metadata, so composing them here would hide a domain-specific projection rather than remove shared foundation logic. gcode also derives incremental state from PostgreSQL indexed_files.content_hash rows instead of consuming core IndexEvent snapshots.

Functions§

chunk_file_content
Split file content into overlapping chunks for FTS indexing.