Expand description
lunaris-embed — real Embedder impls for Phase 2 hot path (INGEST-02).
- Default backend (
feature = "candle"):CandleEmbeddingGemma— loads EmbeddingGemma 300M tokenizer + token-embedding matrix from a local cache, mean-pools the embedded token vectors per input, and L2-normalises to a 768-d unit vector. Falls back with an actionableLunarisError::Storageerror when the weights cache is missing. - Alt backend (
feature = "ollama"):OllamaEmbedder— POSTs each batch to<endpoint>/api/embed(Ollama’sembedHTTP API), validates the response shape againstEmbedder::dim, and returns 768-d rows. 10s HTTP timeout (CLAUDE.md: “design for failure — timeouts”).
Phase 1’s lunaris_core::StubEmbedder remains the deterministic test impl —
ingest tests inject it via the Lunaris::with_embedder escape hatch so they
don’t pay model-load latency. Production callers get CandleEmbeddingGemma
by default through Lunaris::open(url) (Plan 02-01 Task 3).
§Latency budget swap escape hatch
Per 02-01-PLAN.md critical constraints: if candle local inference busts
the per-batch budget on the dev box (8ms p50 / 20ms p99 per blueprint §4.1),
callers swap to OllamaEmbedder via Lunaris::with_embedder(Arc::new(...)).
The trait shape does NOT change either way — that’s the whole point of the
Phase 1 Embedder interface lock.
Re-exports§
pub use fastembed::FASTEMBED_EXECUTION_ENV;pub use fastembed::FastembedEmbedder;pub use fastembed::FastembedEmbedderOpts;pub use fastembed::FastembedUserDefinedOpts;pub use fastembed::PoolingMode;pub use fastembed_exec::ExecutionPreference;pub use fastembed_exec::execution_from_env;pub use fastembed_exec::parse_execution;
Modules§
- fallback
- RFC 0007 §3 —
FallbackEmbedder<P, F>static-dispatch combinator. - fastembed
FastembedEmbedder— ONNX-backed EmbeddingGemma 300M via fastembed-rs.- fastembed_
exec - ORT execution-provider plumbing for the fastembed backends (Phase 20 Plan 20-01).