Expand description
Universal embedding client with config-driven provider cascade.
Supports any OpenAI-compatible embedding API (Ollama, vLLM, TEI, etc.) Providers are tried in priority order until one responds.
§Example config.toml
[embeddings]
required_dimension = 2560
max_batch_chars = 32000
max_batch_items = 16
[[embeddings.providers]]
name = "ollama-local"
base_url = "http://localhost:11434"
model = "qwen3-embedding:4b"
priority = 1
[[embeddings.providers]]
name = "dragon"
base_url = "http://dragon:12345"
model = "Qwen/Qwen3-Embedding-4B"
priority = 2Structs§
- Dimension
Adapter - Adapter for cross-dimension embedding compatibility.
- Embedding
Client - Universal embedding client with provider cascade
- Embedding
Config - Complete embedding configuration
- MlxConfig
- Legacy MLX configuration - deprecated, use EmbeddingConfig instead
- MlxMerge
Options - Options for merging file config into MlxConfig
- Provider
Config - Single embedding provider configuration
- Reranker
Config - Reranker configuration (optional, separate from embedders)
- Token
Config - Token estimation configuration
Constants§
- DEFAULT_
MAX_ TOKENS - Default safe ceiling for qwen3-embedding context window. Real model limit is 40 960 tokens; we leave ~6k margin for safety.
- DEFAULT_
OLLAMA_ EMBEDDING_ MODEL - DEFAULT_
REQUIRED_ DIMENSION
Functions§
- cross_
dimension_ search_ adapt - Perform cross-dimension search by adapting query embedding
- estimate_
tokens - Estimate token count for text
- safe_
chunk_ size - Calculate safe chunk size in characters for given token limit
- truncate_
to_ token_ limit - Truncate text to fit within token limit
- validate_
batch_ tokens - Validate a batch of texts and return which ones exceed limits
- validate_
chunk_ tokens - Validate that a chunk fits within token limits