Expand description
Local embedding generation (LLM-only, one-shot per invocation). Embedding generation for the GraphRAG memory.
v1.0.76: the default build is LLM-only — the binary does NOT bundle
fastembed / ort / ndarray / tokenizers. All embeddings are produced
by a headless invocation of claude code or codex (OAuth, no MCP,
no hooks) and stored as a BLOB in memory_embeddings(memory_id, embedding, source). Vector similarity is computed in pure Rust at query time.
The legacy fastembed pipeline is still available behind the opt-in
embedding-legacy feature for the transition window. It is removed
in v1.1.0. New code MUST use the LLM path (embed_passage /
embed_query here, which always call the LLM).
Functions§
- bytes_
to_ f32 - embed_
passage - Embeds a single passage for storage. Delegates to the configured LLM headless (claude code / codex). Returns a 384-dim f32 vector.
- embed_
passage_ local - embed_
passages_ controlled - Embeds a batch of passages with token-count-aware batching. The
token_countsare still used to keep the LLM invocation under the per-call context budget, but the count is now an approximation (whitespace-split words) since thetokenizerscrate was removed. - embed_
passages_ controlled_ local - embed_
query - Embeds a single query for similarity search. Same model and dim as
embed_passage; the only difference is the LLM-side prompt prefix that the headless invocation uses to disambiguate. - embed_
query_ local - embedding_
dim - Returns the dimensionality of the embedding space. Used to validate LLM responses and to size the in-memory cache.
- f32_
to_ bytes - get_
embedder - Initialises the LLM-embedding client on first use and returns it.