# Uni-Xervo
Unified Rust runtime for embedding, reranking, and generation across local and remote model providers.
`uni-xervo` gives you one runtime and one API surface for mixed model stacks, so application code stays stable while you swap providers, models, and execution modes.
## Overview
Uni-Xervo is built around three core ideas:
- Model aliases: your app requests models by stable names like `embed/default` or `generate/llm`.
- Provider abstraction: local and remote providers implement the same task traits.
- Runtime deduplication: equivalent model specs share one loaded instance.
Core tasks:
- `embed` for vector embeddings
- `rerank` for relevance scoring
- `generate` for text generation, vision, image generation, and speech synthesis
## Why Uni-Xervo?
- Keep product code provider-agnostic.
- Mix local and remote models in one runtime.
- Multimodal generation: text, vision, diffusion (image gen), and speech pipelines.
- Enforce config correctness with schema-backed option validation.
- Control startup behavior with lazy, eager, or background warmup.
- Add retries/timeouts per model alias instead of hard-coding behavior.
## Provider Support
| `local/candle` | `embed` | `provider-candle` |
| `local/fastembed` | `embed` | `provider-fastembed` |
| `local/mistralrs` | `embed`, `generate` (text, vision, diffusion, speech) | `provider-mistralrs` |
| `remote/openai` | `embed`, `generate` | `provider-openai` |
| `remote/gemini` | `embed`, `generate` | `provider-gemini` |
| `remote/vertexai` | `embed`, `generate` | `provider-vertexai` |
| `remote/mistral` | `embed`, `generate` | `provider-mistral` |
| `remote/anthropic` | `generate` | `provider-anthropic` |
| `remote/voyageai` | `embed`, `rerank` | `provider-voyageai` |
| `remote/cohere` | `embed`, `rerank`, `generate` | `provider-cohere` |
| `remote/azure-openai` | `embed`, `generate` | `provider-azure-openai` |
## Installation
Use only the features you need.
```toml
[dependencies]
uni-xervo = { version = "0.4.0", default-features = false, features = ["provider-candle"] }
tokio = { version = "1", features = ["full"] }
```
Default feature set:
- `provider-candle`
If you want local embeddings + OpenAI generation:
```toml
[dependencies]
uni-xervo = { version = "0.4.0", default-features = false, features = ["provider-candle", "provider-openai"] }
tokio = { version = "1", features = ["full"] }
```
GPU acceleration flag:
- `gpu-cuda` for CUDA-enabled builds.
## Quick Start (Rust)
```rust
use uni_xervo::api::{ModelAliasSpec, ModelTask};
use uni_xervo::provider::candle::LocalCandleProvider;
use uni_xervo::runtime::ModelRuntime;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let spec = ModelAliasSpec {
alias: "embed/local".to_string(),
task: ModelTask::Embed,
provider_id: "local/candle".to_string(),
model_id: "sentence-transformers/all-MiniLM-L6-v2".to_string(),
revision: None,
warmup: Default::default(),
required: true,
timeout: None,
load_timeout: None,
retry: None,
options: serde_json::Value::Null,
};
let runtime = ModelRuntime::builder()
.register_provider(LocalCandleProvider::new())
.catalog(vec![spec])
.build()
.await?;
let embedder = runtime.embedding("embed/local").await?;
let vectors = embedder.embed(vec!["hello world"]).await?;
println!("vector dims = {}", vectors[0].len());
Ok(())
}
```
## JSON Config Example (`generate/llm`)
Model catalogs are JSON arrays of `ModelAliasSpec`.
`model-catalog.json`:
```json
[
{
"alias": "embed/default",
"task": "embed",
"provider_id": "local/candle",
"model_id": "sentence-transformers/all-MiniLM-L6-v2",
"warmup": "lazy",
"required": true,
"options": null
},
{
"alias": "generate/llm",
"task": "generate",
"provider_id": "remote/openai",
"model_id": "gpt-4o-mini",
"warmup": "lazy",
"timeout": 30,
"retry": {
"max_attempts": 3,
"initial_backoff_ms": 200
},
"options": {
"api_key_env": "OPENAI_API_KEY"
}
}
]
```
## Load JSON Config and Run Generation
```rust
use uni_xervo::provider::{LocalCandleProvider, RemoteOpenAIProvider};
use uni_xervo::runtime::ModelRuntime;
use uni_xervo::traits::{GenerationOptions, Message};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let runtime = ModelRuntime::builder()
.register_provider(LocalCandleProvider::new())
.register_provider(RemoteOpenAIProvider::new())
.catalog_from_file("model-catalog.json")?
.build()
.await?;
let llm = runtime.generator("generate/llm").await?;
let result = llm
.generate(
&[
Message::user("You are a concise assistant."),
Message::assistant("Understood."),
Message::user("Explain what embeddings are in one paragraph."),
],
GenerationOptions {
max_tokens: Some(200),
temperature: Some(0.3),
top_p: Some(0.9),
..Default::default()
},
)
.await?;
println!("{}", result.text);
Ok(())
}
```
## Configuration and Validation
- Catalog schema: `schemas/model-catalog.schema.json`
- Provider option schemas: `schemas/provider-options/*.schema.json`
- Unknown keys or wrong value types fail fast during runtime build/register.
Default remote credential env vars:
| `remote/openai` | `OPENAI_API_KEY` | None |
| `remote/gemini` | `GEMINI_API_KEY` | None |
| `remote/vertexai` | `VERTEX_AI_TOKEN` | `project_id` option or `VERTEX_AI_PROJECT` |
| `remote/mistral` | `MISTRAL_API_KEY` | None |
| `remote/anthropic` | `ANTHROPIC_API_KEY` | None |
| `remote/voyageai` | `VOYAGE_API_KEY` | None |
| `remote/cohere` | `CO_API_KEY` | None |
| `remote/azure-openai` | `AZURE_OPENAI_API_KEY` | `resource_name` option |
## CLI Prefetch Utility
The repository includes a prefetch CLI target (`src/bin/prefetch.rs`) to pre-download local model artifacts:
```bash
cargo run --bin prefetch -- model-catalog.json --dry-run
cargo run --bin prefetch -- model-catalog.json
```
Remote providers are skipped by design because they do not cache local weights.
## Development
```bash
# Build
./scripts/build.sh
# Format + check + test
./scripts/test.sh
# Ignored integration tests (real providers)
./scripts/test-integration.sh
```
Integration tests for real providers are gated by `EXPENSIVE_TESTS=1` and relevant API credentials.
## Docs
- Contributing guide: `CONTRIBUTING.md`
- Development guide: `DEVELOPMENT.md`
- Community guidelines: `COMMUNITY.md`
- Code of conduct: `CODE_OF_CONDUCT.md`
- Support guide: `SUPPORT.md`
- Security policy: `SECURITY.md`
- User guide: `docs/USER_GUIDE.md`
- Testing guide: `TESTING.md`
- Website docs: `website/`
## License
Apache-2.0 (`LICENSE`).