Skip to main content

embed_all

Function embed_all 

Source
pub fn embed_all(
    root: &Path,
    backends: &[&dyn EmbedBackend],
    tokenizer: &Tokenizer,
    cfg: &SearchConfig,
    profiler: &Profiler,
) -> Result<(Vec<CodeChunk>, Vec<Vec<f32>>)>
Expand description

Walk, chunk, and embed all files in a directory.

Returns the chunks and their corresponding embedding vectors. This is the building block for both one-shot search and interactive mode. The caller handles query embedding and ranking.

Accepts multiple backends for hybrid scheduling — chunks are distributed across all backends via work-stealing (see [embed_distributed]).

Automatically selects between two pipeline modes:

  • Batch (< STREAMING_THRESHOLD files): chunk all, tokenize all, sort by length, embed. Optimal for small corpora.
  • Streaming (>= STREAMING_THRESHOLD files): three-stage pipeline with bounded channels. GPU starts after the first batch is ready, not after all chunks are done. Eliminates GPU idle time during chunking/tokenization.

§Errors

Returns an error if file walking, chunking, or embedding fails.