Skip to main content

Crate embedd

Crate embedd 

Source
Expand description

embedd: embedding interfaces + reusable backends (multi-modality substrate).

This crate is the “shared embedding substrate”: consumers should depend on embedd (traits + basic types), and enable backend features (candle-hf, openai, tei, etc.) as needed.

Scope: any modality (text, images, audio, …) as long as the interface remains small and testable. Concretely, we provide:

  • TextEmbedder (text -> vectors)
  • ImageEmbedder (bytes -> vectors)
  • AudioEmbedder (bytes -> vectors; placeholder contract)
  • extension traits for token-level / sparse embeddings.

§Environment variables

Primary (EMBEDD_*):

  • EMBEDD_MODEL / EMBEDD_MODEL_DIR – model source
  • EMBEDD_MAX_LEN – max token length
  • EMBEDD_QUERY_PREFIX / EMBEDD_DOC_PREFIX – prompt prefixes

Legacy IKSH_* equivalents are supported as fallback via from_env_any() but deprecated.

Modules§

safetensors
Safetensors validation utilities.
vector
Vector post-processing helpers (L0-backed).

Structs§

BatchingTextEmbedder
Wrapper that splits large batches into fixed-size chunks before delegating.
L2NormalizedTextEmbedder
Wrapper that enforces L2-normalized outputs.
PromptTemplate
Prompt template applied before tokenization, used for instruction-tuned / prompt-tuned embedders.
PromptedTextEmbedder
Wrapper that applies a PromptTemplate before calling an inner TextEmbedder.
TextEmbedderCapabilities
TruncateDimTextEmbedder
Wrapper that truncates output vectors to the first dim dimensions.

Enums§

EmbedMode
Whether an embedding is for a query or a document/passage.
ModelSource
Configuration of where embedding model artifacts come from.
Normalization
NormalizationPolicy
PromptApplication
Where prompt/scoping is applied.
ScopingPolicy
Declarative scoping/prompt policy for a text embedder call site.
TruncationDirection
TruncationPolicy

Traits§

AudioEmbedder
Minimal audio embedder interface (bytes -> vectors).
ImageEmbedder
Minimal image embedder interface (bytes -> vectors).
SparseEmbedder
Optional extension trait for sparse lexical embeddings.
TextEmbedder
Minimal interface for “text → dense vector” encoders (bi-encoder style).
TokenEmbedder
Optional extension trait for “multi-vector” (late-interaction) embeddings.

Functions§

apply_normalization_policy
apply_output_dim
apply_scoping_policy
Apply a ScopingPolicy to an embedder, returning a boxed embedder.