Expand description
Local embedding generation (LLM-only, one-shot per invocation). Embedding generation for the GraphRAG memory.
v1.0.76: the default build is LLM-only — the binary does NOT bundle
fastembed / ort / ndarray / tokenizers. All embeddings are produced
by a headless invocation of claude code or codex (OAuth, no MCP,
no hooks) and stored as a BLOB in memory_embeddings(memory_id, embedding, source). Vector similarity is computed in pure Rust at query time.
§Workload classification (G42/S3, BLOCK 1 — MANDATORY)
LLM embedding is I/O-bound + subprocess-bound: each call waits
5-60s on a network round-trip through a headless claude -p /
codex exec subprocess while the local CPU stays idle. Concurrency
therefore uses tokio (async I/O concurrency) and NEVER rayon
(reserved for CPU-bound work).
§Permit formula (G42/S3, BLOCO 2)
permits = clamp(--llm-parallelism, 1, 32)
.min(available_parallelism())
.min(available_ram_mb * 0.5 / LLM_WORKER_RSS_MB)LLM_WORKER_RSS_MB = 350 (crate::constants): claude -p and
codex exec are node processes with a typical Maximum RSS of
200-400 MB (measured via /usr/bin/time -l on macOS /
/usr/bin/time -v on Linux), so the RAM bound is pertinent.
§Locking contract (G42/A3 fix)
The process-wide Mutex<LlmEmbedding> protects ONLY the cheap clone
of the client configuration (flavour + binary path + model + shared
schema tempfiles). It is NEVER held across network I/O — the
v1.0.76-v1.0.78 flush_group held it for the whole sequential
embedding loop, which is why --llm-parallelism 8 measured an
effective parallelism of 1.
Structs§
- Embed
Cache Stats - G56: stats snapshot returned by
embed_entity_texts_cached.
Enums§
- Embedding
Error Kind - GAP-004 (v1.0.88): typed classifier for embedding error messages.
- Fallback
Reason - G58/S1: reason an embedding call could not be completed and the caller must fall back to a non-vector retrieval path (FTS5 prefix + LIKE).
- LlmBackend
Kind - LLM backend kind for the fallback chain. Mirrors the CLI
--llm-backendenum so users can pass the same value to--llm-fallbackwithout translation.
Constants§
- CHUNK_
EMBED_ BATCH_ SIZE - Calibration base: chunk (long-text) batch size per LLM call at the
calibration dimensionality (G42/S2). Use
chunk_embed_batch_sizefor the dim-adaptive value (G44). - EMBED_
BATCH_ CALIBRATION_ DIM - Dimensionality the batch bases above were calibrated against (G44).
- ENTITY_
EMBED_ BATCH_ SIZE - Calibration base: entity-name (short-text) batch size per LLM call at
the calibration dimensionality (G42/S2). Use
entity_embed_batch_sizefor the dim-adaptive value (G44).
Functions§
- bytes_
to_ f32 - chunk_
embed_ batch_ size - Dim-adaptive batch size for chunk (long-text) embedding calls (G44).
- classify_
embedding_ error - Classify an embedding
AppErrorinto a typedFallbackReason. - effective_
permits - G42/S3 BLOCO 2: effective permit count.
- embed_
entity_ texts_ cached - G56: embeds entity-name texts through a process-wide cache.
- embed_
passage - Embeds a single passage for storage. Delegates to the configured LLM headless (claude code / codex). Returns a vector of the active dimensionality.
- embed_
passage_ local - embed_
passage_ local_ resolved - BUG-003 / v1.0.85: split of
embed_passage_localthat reports the resolvedLlmBackendKindbased on the ACTUALLlmEmbedding::flavourof the embedder constructed. WhenLlmEmbedding::detect_availablesubstitutes claude for a missing codex, the operator sees the truth inenvelope.backend_invoked. - embed_
passage_ or_ skip - v1.0.89 (BUG-SKIP-EMBED + GAP-EMBED-PROPAGATION): embed a passage
honouring both
--llm-backendand--skip-embedding-on-failure. - embed_
passage_ with_ choice - Embed a single passage using the LLM backend selected by the user via
--llm-backend. Routes toembed_with_fallbackso failures fall through to the next backend in the chain before giving up. - embed_
passage_ with_ embedding_ choice - v1.0.93: embedding with
EmbeddingBackendChoiceawareness. When the embedding backend isOpenrouterorAutowith a live client, the chain prependsOpenRouterbefore the LLM subprocess backends. - embed_
passages_ controlled - Embeds a batch of passages with token-count-aware batching.
- embed_
passages_ controlled_ local - embed_
passages_ parallel_ local - G42/S3: embeds
textsthrough the bounded parallel fan-out and returns vectors in input order. - embed_
passages_ parallel_ with_ embedding_ choice - v1.0.93 (GAP-OR-INGEST): embeds multiple passages with
EmbeddingBackendChoiceawareness. When the resolved chain starts withOpenRouterand the client is initialised, uses the HTTP batch API (embed_batch) instead of subprocess fan-out — no LLM slot consumed, ~200ms per batch vs ~15s per subprocess cold-start. Falls back toembed_passages_parallel_localfor LLM backends. - embed_
query - Embeds a single query for similarity search. Same model and dim as
embed_passage; the only difference is the LLM-side prompt prefix that the headless invocation uses to disambiguate. - embed_
query_ local - embed_
texts_ parallel - G42/S3 core: bounded parallel batch embedding.
- embed_
texts_ parallel_ with - Like
embed_texts_parallelbut invokeson_resultas soon as each embedding arrives (BLOCO 5: incremental persistence — a kill loses at most the in-flight batches, never the already-delivered items). - embed_
via_ backend - Embeds a single text via the given backend. Used by
embed_with_fallbackand exposed to allow direct one-shot selection without a chain. Embeds a single text via the given backend. Used byembed_with_fallbackand exposed to allow direct one-shot selection without a chain. - embed_
via_ backend_ legacy - Legacy one-shot wrapper around
embed_via_backendthat discards the resolved backend. Kept for call sites that only care about the vector and ignore the executed-backend signal. New code should preferembed_via_backenddirectly. - embed_
via_ backend_ strict - embed_
via_ claude_ local - ADR-0042 / GAP-002: route a single passage through the Claude
embedder. Used by the Claude arm of
embed_via_backendso the fallback chain stops treating Claude as a synonym for codex. - embed_
via_ claude_ local_ resolved - BUG-003 / v1.0.85: split of that also reports the resolved []. Always because this path constructs a Claude-flavoured embedder via (no PATH probe, no silent substitution).
- embed_
via_ opencode_ local_ resolved - GAP-OPENCODE-001 / v1.0.90: route a single passage through the OpenCode
embedder, reporting the resolved
LlmBackendKind::Opencode. Constructs an OpenCode-flavoured embedder viawith_opencode_builder(no PATH probe, no silent substitution). - embed_
with_ fallback - Tries each LLM backend in
chainin order, returning the first successful embedding. On failure, the diagnostic tail of the last error is preserved in the returnedAppError::Embeddingso the operator can see WHY every backend failed. - embedding_
dim - Returns the dimensionality of the embedding space. Used to validate LLM responses and to size the in-memory cache.
- entity_
embed_ batch_ size - Dim-adaptive batch size for entity-name (short-text) embedding calls (G44).
- f32_
to_ bytes - get_
claude_ embedder - ADR-0042 / GAP-002: returns the process-wide Claude embedder, lazily
initialising it on first use. Binary and model overrides come from
the explicit arguments;
Nonefalls back to PATH/env defaults via the builder. - get_
embedder - Initialises the LLM-embedding client on first use and returns it.
- get_
opencode_ embedder - GAP-OPENCODE-001 / v1.0.90: returns the process-wide OpenCode embedder,
lazily initialising it on first use. Binary and model overrides come
from the explicit arguments;
Nonefalls back to PATH/env defaults via the builder. - get_
openrouter_ embedder - is_
openrouter_ initialized - v1.0.93: check whether the OpenRouter client has been initialised.
- should_
skip_ embedding_ on_ failure - v1.0.89 (BUG-SKIP-EMBED): reads
SQLITE_GRAPHRAG_SKIP_EMBEDDING_ON_FAILUREenv var (set by--skip-embedding-on-failurevia main.rs propagation). Returnstruewhen the user opted to persist with NULL embedding on failure. - try_
embed_ query_ with_ choice - failure, returns a structured
FallbackReasonso the caller can surfacevec_degradedinstead of a hard exit 11. - try_
embed_ query_ with_ deterministic_ fallback - G58 / ADR-0043 (v1.0.85): deterministic fallback for
recallandhybrid-search. - try_
embed_ query_ with_ embedding_ choice - v1.0.93 (GAP-OR-INGEST): query embedding with
EmbeddingBackendChoiceawareness. Mirrorstry_embed_query_with_choicebut routes throughembed_passage_with_embedding_choiceso OpenRouter API is used when configured. - try_
embed_ query_ with_ fallback - G58/S1: try to embed a query, mapping any failure to a structured
FallbackReasonso callers can route to FTS5 + LIKE fallback instead of returning exit 11 to the user.