Uni-Xervo
Unified Rust runtime for embedding, reranking, and generation across local and remote model providers.
uni-xervo gives you one runtime and one API surface for mixed model stacks, so application code stays stable while you swap providers, models, and execution modes.
Overview
Uni-Xervo is built around three core ideas:
- Model aliases: your app requests models by stable names like
embed/defaultorgenerate/llm. - Provider abstraction: local and remote providers implement the same task traits.
- Runtime deduplication: equivalent model specs share one loaded instance.
Core tasks:
embedfor vector embeddingsrerankfor relevance scoringgeneratefor LLM text generation
Why Uni-Xervo?
- Keep product code provider-agnostic.
- Mix local and remote models in one runtime.
- Enforce config correctness with schema-backed option validation.
- Control startup behavior with lazy, eager, or background warmup.
- Add retries/timeouts per model alias instead of hard-coding behavior.
Provider Support
| Provider ID | Tasks | Cargo Feature |
|---|---|---|
local/candle |
embed |
provider-candle |
local/fastembed |
embed |
provider-fastembed |
local/mistralrs |
embed, generate |
provider-mistralrs |
remote/openai |
embed, generate |
provider-openai |
remote/gemini |
embed, generate |
provider-gemini |
remote/vertexai |
embed, generate |
provider-vertexai |
remote/mistral |
embed, generate |
provider-mistral |
remote/anthropic |
generate |
provider-anthropic |
remote/voyageai |
embed, rerank |
provider-voyageai |
remote/cohere |
embed, rerank, generate |
provider-cohere |
remote/azure-openai |
embed, generate |
provider-azure-openai |
Installation
Use only the features you need.
[]
= { = "0.1.0", = false, = ["provider-candle"] }
= { = "1", = ["full"] }
Default feature set:
provider-candle
If you want local embeddings + OpenAI generation:
[]
= { = "0.1.0", = false, = ["provider-candle", "provider-openai"] }
= { = "1", = ["full"] }
GPU acceleration flag:
gpu-cudafor CUDA-enabled builds.
Quick Start (Rust)
use ;
use LocalCandleProvider;
use ModelRuntime;
async
JSON Config Example (generate/llm)
Model catalogs are JSON arrays of ModelAliasSpec.
model-catalog.json:
Load JSON Config and Run Generation
use ;
use ModelRuntime;
use GenerationOptions;
async
Configuration and Validation
- Catalog schema:
schemas/model-catalog.schema.json - Provider option schemas:
schemas/provider-options/*.schema.json - Unknown keys or wrong value types fail fast during runtime build/register.
Default remote credential env vars:
| Provider ID | Default credential env var | Extra required options |
|---|---|---|
remote/openai |
OPENAI_API_KEY |
None |
remote/gemini |
GEMINI_API_KEY |
None |
remote/vertexai |
VERTEX_AI_TOKEN |
project_id option or VERTEX_AI_PROJECT |
remote/mistral |
MISTRAL_API_KEY |
None |
remote/anthropic |
ANTHROPIC_API_KEY |
None |
remote/voyageai |
VOYAGE_API_KEY |
None |
remote/cohere |
CO_API_KEY |
None |
remote/azure-openai |
AZURE_OPENAI_API_KEY |
resource_name option |
CLI Prefetch Utility
The repository includes a prefetch CLI target (src/bin/prefetch.rs) to pre-download local model artifacts:
Remote providers are skipped by design because they do not cache local weights.
Development
# Build
# Format + check + test
# Ignored integration tests (real providers)
Integration tests for real providers are gated by EXPENSIVE_TESTS=1 and relevant API credentials.
Docs
- Contributing guide:
CONTRIBUTING.md - Development guide:
DEVELOPMENT.md - Community guidelines:
COMMUNITY.md - Code of conduct:
CODE_OF_CONDUCT.md - Support guide:
SUPPORT.md - Security policy:
SECURITY.md - User guide:
docs/USER_GUIDE.md - Testing guide:
TESTING.md - Website docs:
website/
License
Apache-2.0 (LICENSE).