Skip to main content

Module embed

Module embed 

Source
Expand description

Parallel batch embedding pipeline with streaming backpressure.

Two pipeline modes:

  • Batch mode (< STREAMING_THRESHOLD files): walk, chunk all, tokenize all, sort by length, embed. Simple and optimal for small corpora.

  • Streaming mode (>= STREAMING_THRESHOLD files): three-stage pipeline with bounded channels. Chunks flow through: rayon chunk workers -> tokenize+batch collector -> GPU embed consumer. The GPU starts after the first batch_size encodings are ready (~50ms), not after all chunks are done. Backpressure prevents unbounded memory growth.

§Batch inference

Instead of one forward pass per chunk, chunks are grouped into batches of configurable size (default 32). Each batch is tokenized, padded to the longest sequence, and run as a single forward pass with shape [batch_size, max_seq_len]. This amortizes per-call overhead and enables SIMD across the batch dimension.

§Parallelism

On CPU, each rayon thread gets its own backend clone (cheap — most backends use Arc’d weights internally). On GPU, batches run sequentially from Rust while the device parallelizes internally.

Structs§

SearchConfig
Runtime configuration for the search pipeline.
SearchResult
A search result pairing a code chunk with its similarity score.

Constants§

DEFAULT_BATCH_SIZE
Default batch size for embedding inference.

Functions§

apply_structural_boost
Normalize similarity scores to [0,1] and apply a PageRank structural boost.
embed_all
Walk, chunk, and embed all files in a directory.
search
Search a directory for code chunks semantically similar to a query.