Module embed

Expand description

Text-similarity / embedding host capability.

A cross-platform, offline, DRY core for cosine/semantic similarity. It is the single source of truth two consumers share:

Push-context Tier-2 (Burin pipelines): auto-injecting skills/canon/memory/few-shot above a similarity threshold.
SymbolRelevance (Burin Swift): symbol ranking, today split between macOS-only NLEmbedding and a Linux Jaccard fallback. Both can now route through these builtins for one cross-platform path.

§Surface

Builtin	What it does
`hostlib_embed_similarity`	Cosine similarity of two strings via the active backend.
`hostlib_embed_top_k`	Rank a corpus of strings against a query, return top `k`.
`hostlib_embed_vector`	Embed one string to its raw `f32` vector.
`hostlib_embed_info`	Active backend name + dimensionality.

§Backend selection

The capability owns one backend::Embedder behind an Arc, shared across every VM/thread (mirroring code_index). Default is the always-available backend::LexicalEmbedder (zero asset, microsecond, deterministic across OSes). When a Model2Vec/“potion”-style static asset is resolvable (settings/sandbox-aware, no network), the capability upgrades to backend::StaticEmbedder; if the asset is missing or malformed it degrades cleanly back to lexical. A future candle/ONNX transformer tier slots in behind a Cargo feature without changing this surface or either consumer.

Structs§

EmbedCapability: Embedding capability handle. Cloning shares the active backend.
LexicalEmbedder: Hashing-trick lexical embedder. Always available, no asset.
Scored: One scored corpus entry, returned by top_k.
StaticEmbedder: Model2Vec / potion-style static token-pooled embedder.

Constants§

BUILTIN_INFO: Builtin name for reporting the active backend.
BUILTIN_SIMILARITY: Builtin name for cosine similarity of two strings.
BUILTIN_TOP_K: Builtin name for top-k corpus ranking against a query.
BUILTIN_VECTOR: Builtin name for embedding one string to its raw vector.

Traits§

Embedder: A backend that maps text to a fixed-dimension embedding vector.

Functions§

cosine: Cosine similarity between two equal-length vectors.
resolve_asset_dir: Resolve the asset directory for a named embedding model, honoring an explicit override before falling back to a conventional location under the data dir. Returns None when nothing resolvable exists, which the caller treats as “use the lexical floor”.
top_k: Rank corpus vectors against query and return the top k by cosine similarity, highest first.

Module embed

Module embed Copy item path

§Surface

§Backend selection

Structs§

Constants§

Traits§

Functions§

Module embed