Expand description
Parallel batch embedding pipeline with streaming backpressure.
Two pipeline modes:
-
Batch mode (<
STREAMING_THRESHOLDfiles): walk, chunk all, tokenize all, sort by length, embed. Simple and optimal for small corpora. -
Streaming mode (>=
STREAMING_THRESHOLDfiles): three-stage pipeline with bounded channels. Chunks flow through: rayon chunk workers -> tokenize+batch collector -> GPU embed consumer. The GPU starts after the firstbatch_sizeencodings are ready (~50ms), not after all chunks are done. Backpressure prevents unbounded memory growth.
§Batch inference
Instead of one forward pass per chunk, chunks are grouped into batches
of configurable size (default 32). Each batch is tokenized, padded to
the longest sequence, and run as a single forward pass with shape
[batch_size, max_seq_len]. This amortizes per-call overhead and enables
SIMD across the batch dimension.
§Parallelism
On CPU, each rayon thread gets its own backend clone (cheap — most
backends use Arc’d weights internally). On GPU, batches run sequentially
from Rust while the device parallelizes internally.
Structs§
- Search
Config - Runtime configuration for the search pipeline.
- Search
Result - A search result pairing a code chunk with its similarity score.
Constants§
- DEFAULT_
BATCH_ SIZE - Default batch size for embedding inference.
Functions§
- apply_
structural_ boost - Normalize similarity scores to
[0,1]and apply aPageRankstructural boost. - embed_
all - Walk, chunk, and embed all files in a directory.
- search
- Search a directory for code chunks semantically similar to a query.