Expand description
embedd: embedding interfaces + reusable backends (multi-modality substrate).
This crate is the “shared embedding substrate”: consumers should depend on embedd
(traits + basic types), and enable backend features (candle-hf, openai, tei, etc.)
as needed.
Scope: any modality (text, images, audio, …) as long as the interface remains small and testable. Concretely, we provide:
TextEmbedder(text -> vectors)ImageEmbedder(bytes -> vectors)AudioEmbedder(bytes -> vectors; placeholder contract)- extension traits for token-level / sparse embeddings.
§Environment variables
Primary (EMBEDD_*):
EMBEDD_MODEL/EMBEDD_MODEL_DIR– model sourceEMBEDD_MAX_LEN– max token lengthEMBEDD_QUERY_PREFIX/EMBEDD_DOC_PREFIX– prompt prefixes
Legacy IKSH_* equivalents are supported as fallback via from_env_any() but deprecated.
Modules§
- safetensors
- Safetensors validation utilities.
- vector
- Vector post-processing helpers (L0-backed).
Structs§
- Batching
Text Embedder - Wrapper that splits large batches into fixed-size chunks before delegating.
- L2Normalized
Text Embedder - Wrapper that enforces L2-normalized outputs.
- Prompt
Template - Prompt template applied before tokenization, used for instruction-tuned / prompt-tuned embedders.
- Prompted
Text Embedder - Wrapper that applies a
PromptTemplatebefore calling an innerTextEmbedder. - Text
Embedder Capabilities - Truncate
DimText Embedder - Wrapper that truncates output vectors to the first
dimdimensions.
Enums§
- Embed
Mode - Whether an embedding is for a query or a document/passage.
- Model
Source - Configuration of where embedding model artifacts come from.
- Normalization
- Normalization
Policy - Prompt
Application - Where prompt/scoping is applied.
- Scoping
Policy - Declarative scoping/prompt policy for a text embedder call site.
- Truncation
Direction - Truncation
Policy
Traits§
- Audio
Embedder - Minimal audio embedder interface (bytes -> vectors).
- Image
Embedder - Minimal image embedder interface (bytes -> vectors).
- Sparse
Embedder - Optional extension trait for sparse lexical embeddings.
- Text
Embedder - Minimal interface for “text → dense vector” encoders (bi-encoder style).
- Token
Embedder - Optional extension trait for “multi-vector” (late-interaction) embeddings.
Functions§
- apply_
normalization_ policy - apply_
output_ dim - apply_
scoping_ policy - Apply a
ScopingPolicyto an embedder, returning a boxed embedder.