Module embedder

Expand description

Local embedding generation (LLM-only, one-shot per invocation). Embedding generation for the GraphRAG memory.

v1.0.76: the default build is LLM-only — the binary does NOT bundle fastembed / ort / ndarray / tokenizers. All embeddings are produced by a headless invocation of claude code or codex (OAuth, no MCP, no hooks) and stored as a BLOB in memory_embeddings(memory_id, embedding, source). Vector similarity is computed in pure Rust at query time.

§Workload classification (G42/S3, BLOCK 1 — MANDATORY)

LLM embedding is I/O-bound + subprocess-bound: each call waits 5-60s on a network round-trip through a headless claude -p / codex exec subprocess while the local CPU stays idle. Concurrency therefore uses tokio (async I/O concurrency) and NEVER rayon (reserved for CPU-bound work).

§Permit formula (G42/S3, BLOCO 2)

permits = clamp(--llm-parallelism, 1, 32)
          .min(available_parallelism())
          .min(available_ram_mb * 0.5 / LLM_WORKER_RSS_MB)

LLM_WORKER_RSS_MB = 350 (crate::constants): claude -p and codex exec are node processes with a typical Maximum RSS of 200-400 MB (measured via /usr/bin/time -l on macOS / /usr/bin/time -v on Linux), so the RAM bound is pertinent.

§Locking contract (G42/A3 fix)

The process-wide Mutex<LlmEmbedding> protects ONLY the cheap clone of the client configuration (flavour + binary path + model + shared schema tempfiles). It is NEVER held across network I/O — the v1.0.76-v1.0.78 flush_group held it for the whole sequential embedding loop, which is why --llm-parallelism 8 measured an effective parallelism of 1.

Structs§

EmbedCacheStats: G56: stats snapshot returned by embed_entity_texts_cached.

Enums§

EmbeddingErrorKind: GAP-004 (v1.0.88): typed classifier for embedding error messages.
FallbackReason: G58/S1: reason an embedding call could not be completed and the caller must fall back to a non-vector retrieval path (FTS5 prefix + LIKE).
LlmBackendKind: LLM backend kind for the fallback chain. Mirrors the CLI --llm-backend enum so users can pass the same value to --llm-fallback without translation.

Constants§

CHUNK_EMBED_BATCH_SIZE: Calibration base: chunk (long-text) batch size per LLM call at the calibration dimensionality (G42/S2). Use chunk_embed_batch_size for the dim-adaptive value (G44).
EMBED_BATCH_CALIBRATION_DIM: Dimensionality the batch bases above were calibrated against (G44).
ENTITY_EMBED_BATCH_SIZE: Calibration base: entity-name (short-text) batch size per LLM call at the calibration dimensionality (G42/S2). Use entity_embed_batch_size for the dim-adaptive value (G44).

Functions§

bytes_to_f32
chunk_embed_batch_size: Dim-adaptive batch size for chunk (long-text) embedding calls (G44).
classify_embedding_error: Classify an embedding AppError into a typed FallbackReason.
effective_permits: G42/S3 BLOCO 2: effective permit count.
embed_entity_texts_cached: G56: embeds entity-name texts through a process-wide cache.
embed_passage: Embeds a single passage for storage. Delegates to the configured LLM headless (claude code / codex). Returns a vector of the active dimensionality.
embed_passage_local
embed_passage_local_resolved: BUG-003 / v1.0.85: split of embed_passage_local that reports the resolved LlmBackendKind based on the ACTUAL LlmEmbedding::flavour of the embedder constructed. When LlmEmbedding::detect_available substitutes claude for a missing codex, the operator sees the truth in envelope.backend_invoked.
embed_passage_or_skip: v1.0.89 (BUG-SKIP-EMBED + GAP-EMBED-PROPAGATION): embed a passage honouring both --llm-backend and --skip-embedding-on-failure.
embed_passage_with_choice: Embed a single passage using the LLM backend selected by the user via --llm-backend. Routes to embed_with_fallback so failures fall through to the next backend in the chain before giving up.
embed_passage_with_embedding_choice: v1.0.93: embedding with EmbeddingBackendChoice awareness. When the embedding backend is Openrouter or Auto with a live client, the chain prepends OpenRouter before the LLM subprocess backends.
embed_passages_controlled: Embeds a batch of passages with token-count-aware batching.
embed_passages_controlled_local
embed_passages_parallel_local: G42/S3: embeds texts through the bounded parallel fan-out and returns vectors in input order.
embed_passages_parallel_with_embedding_choice: v1.0.93 (GAP-OR-INGEST): embeds multiple passages with EmbeddingBackendChoice awareness. When the resolved chain starts with OpenRouter and the client is initialised, uses the HTTP batch API (embed_batch) instead of subprocess fan-out — no LLM slot consumed, ~200ms per batch vs ~15s per subprocess cold-start. Falls back to embed_passages_parallel_local for LLM backends.
embed_query: Embeds a single query for similarity search. Same model and dim as embed_passage; the only difference is the LLM-side prompt prefix that the headless invocation uses to disambiguate.
embed_query_local
embed_texts_parallel: G42/S3 core: bounded parallel batch embedding.
embed_texts_parallel_with: Like embed_texts_parallel but invokes on_result as soon as each embedding arrives (BLOCO 5: incremental persistence — a kill loses at most the in-flight batches, never the already-delivered items).
embed_via_backend: Embeds a single text via the given backend. Used by embed_with_fallback and exposed to allow direct one-shot selection without a chain. Embeds a single text via the given backend. Used by embed_with_fallback and exposed to allow direct one-shot selection without a chain.
embed_via_backend_legacy: Legacy one-shot wrapper around embed_via_backend that discards the resolved backend. Kept for call sites that only care about the vector and ignore the executed-backend signal. New code should prefer embed_via_backend directly.
embed_via_backend_strict
embed_via_claude_local: ADR-0042 / GAP-002: route a single passage through the Claude embedder. Used by the Claude arm of embed_via_backend so the fallback chain stops treating Claude as a synonym for codex.
embed_via_claude_local_resolved: BUG-003 / v1.0.85: split of that also reports the resolved []. Always because this path constructs a Claude-flavoured embedder via (no PATH probe, no silent substitution).
embed_via_opencode_local_resolved: GAP-OPENCODE-001 / v1.0.90: route a single passage through the OpenCode embedder, reporting the resolved LlmBackendKind::Opencode. Constructs an OpenCode-flavoured embedder via with_opencode_builder (no PATH probe, no silent substitution).
embed_with_fallback: Tries each LLM backend in chain in order, returning the first successful embedding. On failure, the diagnostic tail of the last error is preserved in the returned AppError::Embedding so the operator can see WHY every backend failed.
embedding_dim: Returns the dimensionality of the embedding space. Used to validate LLM responses and to size the in-memory cache.
entity_embed_batch_size: Dim-adaptive batch size for entity-name (short-text) embedding calls (G44).
f32_to_bytes
get_claude_embedder: ADR-0042 / GAP-002: returns the process-wide Claude embedder, lazily initialising it on first use. Binary and model overrides come from the explicit arguments; None falls back to PATH/env defaults via the builder.
get_embedder: Initialises the LLM-embedding client on first use and returns it.
get_opencode_embedder: GAP-OPENCODE-001 / v1.0.90: returns the process-wide OpenCode embedder, lazily initialising it on first use. Binary and model overrides come from the explicit arguments; None falls back to PATH/env defaults via the builder.
get_openrouter_embedder
is_openrouter_initialized: v1.0.93: check whether the OpenRouter client has been initialised.
should_skip_embedding_on_failure: v1.0.89 (BUG-SKIP-EMBED): reads SQLITE_GRAPHRAG_SKIP_EMBEDDING_ON_FAILURE env var (set by --skip-embedding-on-failure via main.rs propagation). Returns true when the user opted to persist with NULL embedding on failure.
try_embed_query_with_choice: failure, returns a structured FallbackReason so the caller can surface vec_degraded instead of a hard exit 11.
try_embed_query_with_deterministic_fallback: G58 / ADR-0043 (v1.0.85): deterministic fallback for recall and hybrid-search.
try_embed_query_with_embedding_choice: v1.0.93 (GAP-OR-INGEST): query embedding with EmbeddingBackendChoice awareness. Mirrors try_embed_query_with_choice but routes through embed_passage_with_embedding_choice so OpenRouter API is used when configured.
try_embed_query_with_fallback: G58/S1: try to embed a query, mapping any failure to a structured FallbackReason so callers can route to FTS5 + LIKE fallback instead of returning exit 11 to the user.

Module embedder

Module embedder Copy item path

§Workload classification (G42/S3, BLOCK 1 — MANDATORY)

§Permit formula (G42/S3, BLOCO 2)

§Locking contract (G42/A3 fix)

Structs§

Enums§

Constants§

Functions§

Module embedder