Skip to main content

Module embeddings

Module embeddings 

Source
Expand description

Universal embedding client with config-driven provider cascade.

Supports any OpenAI-compatible embedding API (Ollama, vLLM, TEI, etc.) Providers are tried in priority order until one responds.

§Example config.toml

[embeddings]
required_dimension = 2560
max_batch_chars = 32000
max_batch_items = 16

[[embeddings.providers]]
name = "ollama-local"
base_url = "http://localhost:11434"
model = "qwen3-embedding:4b"
priority = 1

[[embeddings.providers]]
name = "dragon"
base_url = "http://dragon:12345"
model = "Qwen/Qwen3-Embedding-4B"
priority = 2

Structs§

DimensionAdapter
Adapter for cross-dimension embedding compatibility.
EmbeddingClient
Universal embedding client with provider cascade
EmbeddingConfig
Complete embedding configuration
MlxConfig
Legacy MLX configuration - deprecated, use EmbeddingConfig instead
MlxMergeOptions
Options for merging file config into MlxConfig
ProviderConfig
Single embedding provider configuration
RerankerConfig
Reranker configuration (optional, separate from embedders)
TokenConfig
Token estimation configuration

Constants§

DEFAULT_MAX_TOKENS
Default safe ceiling for qwen3-embedding context window. Real model limit is 40 960 tokens; we leave ~6k margin for safety.
DEFAULT_OLLAMA_EMBEDDING_MODEL
DEFAULT_REQUIRED_DIMENSION

Functions§

cross_dimension_search_adapt
Perform cross-dimension search by adapting query embedding
estimate_tokens
Estimate token count for text
safe_chunk_size
Calculate safe chunk size in characters for given token limit
truncate_to_token_limit
Truncate text to fit within token limit
validate_batch_tokens
Validate a batch of texts and return which ones exceed limits
validate_chunk_tokens
Validate that a chunk fits within token limits

Type Aliases§

MLXBridge