Expand description
Embedding engine for semantic code search.
Provides dense vector embeddings for code chunks using a local ONNX model
(all-MiniLM-L6-v2). Feature-gated under embeddings — falls back gracefully
to BM25-only search when the feature or model is not available.
Architecture:
WordPieceTokenizer → ONNX Model (rten) → Mean Pooling → L2 Normalize → Vec<f32>
Modules§
- download
- Automatic model download from HuggingFace Hub.
- pooling
- Pooling strategies for transformer hidden states.
- tokenizer
- Minimal WordPiece tokenizer for BERT-style embedding models.
Structs§
Functions§
- cosine_
similarity - Compute cosine similarity between two L2-normalized vectors. Both vectors must be pre-normalized for correct results.
- cosine_
similarity_ raw - Compute cosine similarity without requiring pre-normalization.
- shared_
engine - Global singleton embedding engine. Loaded once, shared across all consumers.
Returns None if the embeddings feature is disabled or the model fails to load.
NOTE: This function BLOCKS on first call while loading the ONNX model (~25MB).
For non-blocking access, use
try_shared_engine()instead. - try_
shared_ engine - Non-blocking variant: returns the engine ONLY if already loaded. Never triggers model loading or download. Safe to call on hot paths.