Crate lunaris_embed

Expand description

lunaris-embed — real Embedder impls for Phase 2 hot path (INGEST-02).

Default backend (feature = "candle"): CandleEmbeddingGemma — loads EmbeddingGemma 300M tokenizer + token-embedding matrix from a local cache, mean-pools the embedded token vectors per input, and L2-normalises to a 768-d unit vector. Falls back with an actionable LunarisError::Storage error when the weights cache is missing.
Alt backend (feature = "ollama"): OllamaEmbedder — POSTs each batch to <endpoint>/api/embed (Ollama’s embed HTTP API), validates the response shape against Embedder::dim, and returns 768-d rows. 10s HTTP timeout (CLAUDE.md: “design for failure — timeouts”).

Phase 1’s lunaris_core::StubEmbedder remains the deterministic test impl — ingest tests inject it via the Lunaris::with_embedder escape hatch so they don’t pay model-load latency. Production callers get CandleEmbeddingGemma by default through Lunaris::open(url) (Plan 02-01 Task 3).

§Latency budget swap escape hatch

Per 02-01-PLAN.md critical constraints: if candle local inference busts the per-batch budget on the dev box (8ms p50 / 20ms p99 per blueprint §4.1), callers swap to OllamaEmbedder via Lunaris::with_embedder(Arc::new(...)). The trait shape does NOT change either way — that’s the whole point of the Phase 1 Embedder interface lock.

Re-exports§

pub use fastembed::FASTEMBED_EXECUTION_ENV;
pub use fastembed::FastembedEmbedder;
pub use fastembed::FastembedEmbedderOpts;
pub use fastembed::FastembedUserDefinedOpts;
pub use fastembed::PoolingMode;
pub use fastembed_exec::ExecutionPreference;
pub use fastembed_exec::execution_from_env;
pub use fastembed_exec::parse_execution;

Modules§

fallback: RFC 0007 §3 — FallbackEmbedder<P, F> static-dispatch combinator.
fastembed: FastembedEmbedder — ONNX-backed EmbeddingGemma 300M via fastembed-rs.
fastembed_exec: ORT execution-provider plumbing for the fastembed backends (Phase 20 Plan 20-01).

Traits§

Embedder