Expand description
Text-similarity / embedding host capability.
A cross-platform, offline, DRY core for cosine/semantic similarity. It is the single source of truth two consumers share:
- Push-context Tier-2 (Burin pipelines): auto-injecting skills/canon/memory/few-shot above a similarity threshold.
SymbolRelevance(Burin Swift): symbol ranking, today split between macOS-onlyNLEmbeddingand a Linux Jaccard fallback. Both can now route through these builtins for one cross-platform path.
§Surface
| Builtin | What it does |
|---|---|
hostlib_embed_similarity | Cosine similarity of two strings via the active backend. |
hostlib_embed_top_k | Rank a corpus of strings against a query, return top k. |
hostlib_embed_vector | Embed one string to its raw f32 vector. |
hostlib_embed_info | Active backend name + dimensionality. |
§Backend selection
The capability owns one backend::Embedder behind an Arc, shared
across every VM/thread (mirroring code_index). Default is the
always-available backend::LexicalEmbedder (zero asset, microsecond,
deterministic across OSes). When a Model2Vec/“potion”-style static asset
is resolvable (settings/sandbox-aware, no network), the capability
upgrades to backend::StaticEmbedder; if the asset is missing or
malformed it degrades cleanly back to lexical. A future candle/ONNX
transformer tier slots in behind a Cargo feature without changing this
surface or either consumer.
Structs§
- Embed
Capability - Embedding capability handle. Cloning shares the active backend.
- Lexical
Embedder - Hashing-trick lexical embedder. Always available, no asset.
- Scored
- One scored corpus entry, returned by
top_k. - Static
Embedder - Model2Vec / potion-style static token-pooled embedder.
Constants§
- BUILTIN_
INFO - Builtin name for reporting the active backend.
- BUILTIN_
SIMILARITY - Builtin name for cosine similarity of two strings.
- BUILTIN_
TOP_ K - Builtin name for top-k corpus ranking against a query.
- BUILTIN_
VECTOR - Builtin name for embedding one string to its raw vector.
Traits§
- Embedder
- A backend that maps text to a fixed-dimension embedding vector.
Functions§
- cosine
- Cosine similarity between two equal-length vectors.
- resolve_
asset_ dir - Resolve the asset directory for a named embedding model, honoring an
explicit override before falling back to a conventional location under
the data dir. Returns
Nonewhen nothing resolvable exists, which the caller treats as “use the lexical floor”. - top_k
- Rank
corpusvectors againstqueryand return the topkby cosine similarity, highest first.