Module embed

Expand description

Parallel batch embedding pipeline with streaming backpressure.

Two pipeline modes:

Batch mode (< STREAMING_THRESHOLD files): walk, chunk all, tokenize all, sort by length, embed. Simple and optimal for small corpora.
Streaming mode (>= STREAMING_THRESHOLD files): three-stage pipeline with bounded channels. Chunks flow through: rayon chunk workers -> tokenize+batch collector -> GPU embed consumer. The GPU starts after the first batch_size encodings are ready (~50ms), not after all chunks are done. Backpressure prevents unbounded memory growth.

§Batch inference

Instead of one forward pass per chunk, chunks are grouped into batches of configurable size (default 32). Each batch is tokenized, padded to the longest sequence, and run as a single forward pass with shape [batch_size, max_seq_len]. This amortizes per-call overhead and enables SIMD across the batch dimension.

§Parallelism

On CPU, each rayon thread gets its own backend clone (cheap — most backends use Arc’d weights internally). On GPU, batches run sequentially from Rust while the device parallelizes internally.

Structs§

SearchConfig: Runtime configuration for the search pipeline.
SearchResult: A search result pairing a code chunk with its similarity score.

Constants§

DEFAULT_BATCH_SIZE: Default batch size for embedding inference.

Functions§

apply_structural_boost: Normalize similarity scores to [0,1] and apply a PageRank structural boost.
embed_all: Walk, chunk, and embed all files in a directory.
search: Search a directory for code chunks semantically similar to a query.

Module embed

Module embed Copy item path

§Batch inference

§Parallelism

Structs§

Constants§

Functions§

Module embed