pub fn embed_all(
root: &Path,
backends: &[&dyn EmbedBackend],
tokenizer: &Tokenizer,
cfg: &SearchConfig,
profiler: &Profiler,
) -> Result<(Vec<CodeChunk>, Vec<Vec<f32>>)>Expand description
Walk, chunk, and embed all files in a directory.
Returns the chunks and their corresponding embedding vectors. This is the building block for both one-shot search and interactive mode. The caller handles query embedding and ranking.
Accepts multiple backends for hybrid scheduling — chunks are distributed
across all backends via work-stealing (see embed_distributed).
Automatically selects between two pipeline modes:
- Batch (<
STREAMING_THRESHOLDfiles): chunk all, tokenize all, sort by length, embed. Optimal for small corpora. - Streaming (>=
STREAMING_THRESHOLDfiles): three-stage pipeline with bounded channels. GPU starts after the first batch is ready, not after all chunks are done. Eliminates GPU idle time during chunking/tokenization.
§Errors
Returns an error if file walking, chunking, or embedding fails.