Skip to main content

Module embed

Module embed 

Source
Expand description

Text-similarity / embedding host capability.

A cross-platform, offline, DRY core for cosine/semantic similarity. It is the single source of truth two consumers share:

  1. Push-context Tier-2 (Burin pipelines): auto-injecting skills/canon/memory/few-shot above a similarity threshold.
  2. SymbolRelevance (Burin Swift): symbol ranking, today split between macOS-only NLEmbedding and a Linux Jaccard fallback. Both can now route through these builtins for one cross-platform path.

§Surface

BuiltinWhat it does
hostlib_embed_similarityCosine similarity of two strings via the active backend.
hostlib_embed_top_kRank a corpus of strings against a query, return top k.
hostlib_embed_vectorEmbed one string to its raw f32 vector.
hostlib_embed_infoActive backend name + dimensionality.

§Backend selection

The capability owns one backend::Embedder behind an Arc, shared across every VM/thread (mirroring code_index). Default is the always-available backend::LexicalEmbedder (zero asset, microsecond, deterministic across OSes). When a Model2Vec/“potion”-style static asset is resolvable (settings/sandbox-aware, no network), the capability upgrades to backend::StaticEmbedder; if the asset is missing or malformed it degrades cleanly back to lexical. A future candle/ONNX transformer tier slots in behind a Cargo feature without changing this surface or either consumer.

Structs§

EmbedCapability
Embedding capability handle. Cloning shares the active backend.
LexicalEmbedder
Hashing-trick lexical embedder. Always available, no asset.
Scored
One scored corpus entry, returned by top_k.
StaticEmbedder
Model2Vec / potion-style static token-pooled embedder.

Constants§

BUILTIN_INFO
Builtin name for reporting the active backend.
BUILTIN_SIMILARITY
Builtin name for cosine similarity of two strings.
BUILTIN_TOP_K
Builtin name for top-k corpus ranking against a query.
BUILTIN_VECTOR
Builtin name for embedding one string to its raw vector.

Traits§

Embedder
A backend that maps text to a fixed-dimension embedding vector.

Functions§

cosine
Cosine similarity between two equal-length vectors.
resolve_asset_dir
Resolve the asset directory for a named embedding model, honoring an explicit override before falling back to a conventional location under the data dir. Returns None when nothing resolvable exists, which the caller treats as “use the lexical floor”.
top_k
Rank corpus vectors against query and return the top k by cosine similarity, highest first.