ModelRelay Rust SDK
Async and blocking clients for the ModelRelay API with optional SSE streaming. Default
features enable the async reqwest client and streaming; use the blocking feature when
you want to avoid Tokio in CLI/desktop tools.
Installation
[dependencies]
modelrelay = "0.1.0"
Features:
client (default): async HTTP client built on reqwest + tokio.
blocking: blocking HTTP client with reqwest::blocking (no Tokio runtime required).
streaming (default): SSE streaming support for /llm/proxy (adds reqwest/stream + futures).
tracing: optional spans/events around HTTP requests and streaming (off by default to avoid extra deps).
mock: in-memory mock clients + fixtures for offline tests (non-streaming + streaming).
Blocking LLM proxy (no Tokio)
use modelrelay::{
BlockingClient, BlockingConfig, Model, ProxyMessage, ProxyOptions, ProxyRequest,
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = BlockingClient::new(BlockingConfig {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
..Default::default()
})?;
let mut request = ProxyRequest::new(
Model::OpenAIGpt4oMini,
vec![ProxyMessage {
role: "user".into(),
content: "Write a short greeting without code fences.".into(),
}],
)?;
request.max_tokens = Some(128);
request.stop_sequences = Some(vec!["```".into(), "</code>".into()]);
let completion = client
.llm()
.proxy(request, ProxyOptions::default().with_request_id("chat-123"))?;
println!(
"response {}: {} (stop={:?}, total_tokens={})",
completion.id,
completion.content.join(""),
completion.stop_reason,
completion.usage.total_tokens
);
Ok(())
}
ProxyOptions lets you set request IDs, extra headers, and metadata:
let opts = ProxyOptions::default()
.with_request_id("chat-123")
.with_header("X-Debug", "true")
.with_metadata("team", "rust-sdk")
.with_timeout(std::time::Duration::from_secs(30));
Timeouts & retries
- Defaults: connect timeout 5s, request timeout 60s (non-streaming calls), 3 attempts with exponential backoff + jitter. Streaming calls leave the timeout unset unless you override it.
- Configure defaults on the client:
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
timeout: Some(std::time::Duration::from_secs(30)),
retry: Some(modelrelay::RetryConfig {
max_attempts: 5,
retry_post: true,
..Default::default()
}),
..Default::default()
})?;
- Override per call (async or blocking):
let opts = ProxyOptions::default()
.with_timeout(std::time::Duration::from_secs(10))
.with_retry(modelrelay::RetryConfig {
max_attempts: 1, ..Default::default()
});
RetryConfig::disabled() is a helper to turn retries off, and retry_post controls whether POST
requests (like /llm/proxy) are retried.
## Ergonomic chat builders
You can build and send chat requests without hand-assembling `ProxyRequest`/`ProxyOptions`.
**Async non-streaming**
```rust
use modelrelay::{ChatRequestBuilder, Client, Config};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
..Default::default()
})?;
let completion = ChatRequestBuilder::new("openai/gpt-4o")
.message("user", "Summarize the Rust ownership model in 2 sentences.")
.stop_sequences(vec!["```".into()])
.metadata([("team", "rust"), ("env", "staging")].iter().map(|(k,v)| (*k,*v)).collect())
.request_id("chat-async-1")
.send(&client.llm())
.await?;
println!("reply: {}", completion.content.join(""));
Ok(())
}
Async streaming with adapter
use modelrelay::{ChatRequestBuilder, ChatStreamAdapter, Client, Config};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
environment: Some(modelrelay::Environment::Staging),
client_header: Some("my-app/1.2.3".into()),
..Default::default()
})?;
let stream = ChatRequestBuilder::new("openai/gpt-4o-mini")
.message("user", "Stream a 2-line Rust haiku.")
.request_id("chat-stream-1")
.stream(&client.llm())
.await?;
let mut adapter = ChatStreamAdapter::new(stream);
while let Some(delta) = adapter.next_delta().await? {
print!("{delta}");
}
if let Some(usage) = adapter.final_usage() {
eprintln!("\nstop={:?} tokens={}", adapter.final_stop_reason(), usage.total_tokens);
}
Ok(())
}
Blocking (non-streaming)
use modelrelay::{BlockingClient, BlockingConfig, ChatRequestBuilder};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = BlockingClient::new(BlockingConfig {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
environment: Some(modelrelay::Environment::Sandbox),
client_header: Some("my-cli/0.9.0".into()),
..Default::default()
})?;
let completion = ChatRequestBuilder::new("openai/gpt-4o-mini")
.message("user", "Give me one fun Rust fact.")
.request_id("chat-blocking-1")
.send_blocking(&client.llm())?;
println!("reply: {}", completion.content.join(""));
Ok(())
}
Blocking streaming with adapter
use modelrelay::{BlockingClient, BlockingConfig, ChatRequestBuilder, ChatStreamAdapter};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = BlockingClient::new(BlockingConfig {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
..Default::default()
})?;
let stream = ChatRequestBuilder::new("openai/gpt-4o-mini")
.message("user", "Stream a 2-line poem about ferris the crab.")
.request_id("chat-blocking-stream-1")
.stream_blocking(&client.llm())?;
let mut adapter = ChatStreamAdapter::new(stream);
while let Some(delta) = adapter.next_delta()? {
print!("{delta}");
}
if let Some(usage) = adapter.final_usage() {
eprintln!("\nstop={:?} tokens={}", adapter.final_stop_reason(), usage.total());
}
Ok(())
}
Tracing + metrics
- Enable the optional
tracing feature for spans/events around HTTP attempts and streaming events, and use MetricsCallbacks to capture latency + usage without pulling in tracing.
Config::metrics and BlockingConfig::metrics accept callbacks for HTTP latency, streaming first-token latency, and final token usage (request/response IDs, model/provider, method/path, and errors are included where available).
Async streaming with tracing + metrics
use std::sync::Arc;
use modelrelay::{
ChatRequestBuilder, ChatStreamAdapter, Client, Config, HttpRequestMetrics, MetricsCallbacks,
StreamFirstTokenMetrics, TokenUsageMetrics,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
#[cfg(feature = "tracing")]
tracing_subscriber::fmt()
.with_env_filter("modelrelay=trace")
.init();
let metrics = MetricsCallbacks {
http_request: Some(Arc::new(|m: HttpRequestMetrics| {
eprintln!(
"[http] {} {} => {:?} ({:?}) req_id={:?}",
m.context.method, m.context.path, m.status, m.latency, m.context.request_id
);
})),
stream_first_token: Some(Arc::new(|m: StreamFirstTokenMetrics| {
eprintln!(
"[first-token] model={:?} {}ms req_id={:?} resp_id={:?} err={:?}",
m.context.model,
m.latency.as_millis(),
m.context.request_id,
m.context.response_id,
m.error
);
})),
usage: Some(Arc::new(|m: TokenUsageMetrics| {
eprintln!(
"[usage] path={} model={:?} total_tokens={}",
m.context.path,
m.context.model,
m.usage.total()
);
})),
};
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
metrics: Some(metrics),
..Default::default()
})?;
let stream = ChatRequestBuilder::new("openai/gpt-4o-mini")
.message("user", "Stream a single sentence about telemetry.")
.request_id("chat-metrics-async")
.stream(&client.llm())
.await?;
let mut adapter = ChatStreamAdapter::new(stream);
while let Some(delta) = adapter.next_delta().await? {
print!("{delta}");
}
if let Some(usage) = adapter.final_usage() {
eprintln!(
"\nstop={:?} total_tokens={}",
adapter.final_stop_reason(),
usage.total()
);
}
Ok(())
}
Blocking streaming metrics
use std::sync::Arc;
use modelrelay::{
BlockingClient, BlockingConfig, ChatRequestBuilder, ChatStreamAdapter, MetricsCallbacks,
StreamFirstTokenMetrics, TokenUsageMetrics,
};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let metrics = MetricsCallbacks {
stream_first_token: Some(Arc::new(|m: StreamFirstTokenMetrics| {
eprintln!(
"[blocking first-token] latency_ms={} req_id={:?}",
m.latency.as_millis(),
m.context.request_id
);
})),
usage: Some(Arc::new(|m: TokenUsageMetrics| {
eprintln!(
"[blocking usage] model={:?} total_tokens={}",
m.context.model, m.usage.total()
);
})),
..Default::default()
};
let client = BlockingClient::new(BlockingConfig {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
metrics: Some(metrics),
..Default::default()
})?;
let stream = ChatRequestBuilder::new("openai/gpt-4o-mini")
.message("user", "Stream a 2-line poem about ferris the crab.")
.request_id("chat-blocking-stream-metrics")
.stream_blocking(&client.llm())?;
let mut adapter = ChatStreamAdapter::new(stream);
while let Some(delta) = adapter.next_delta()? {
print!("{delta}");
}
if let Some(usage) = adapter.final_usage() {
eprintln!("\nstop={:?} tokens={}", adapter.final_stop_reason(), usage.total_tokens);
}
Ok(())
}
Typed enums and stop reasons
Testing with mocks (offline)
Enable the mock feature for in-memory clients and fixtures (no network):
[dev-dependencies]
modelrelay = { path = "sdk/rust", features = ["mock", "streaming"] }
Async: non-streaming + streaming
use modelrelay::{
fixtures, ChatRequestBuilder, MockClient, MockConfig, Model, ProxyMessage, ProxyRequest,
ProxyOptions,
};
use futures_util::StreamExt;
#[tokio::test]
async fn offline_completion_and_stream() {
let client = MockClient::new(
MockConfig::default()
.with_proxy_response(fixtures::simple_proxy_response())
.with_stream_events(fixtures::simple_stream_events()),
);
let completion = client
.llm()
.proxy(
ProxyRequest::new(
Model::OpenAIGpt4oMini,
vec![ProxyMessage {
role: "user".into(),
content: "hi".into(),
}],
)?,
ProxyOptions::default(),
)
.await?;
assert_eq!(completion.content.join(""), "hello world");
let mut stream = ChatRequestBuilder::new(Model::OpenAIGpt4oMini)
.message("user", "stream me something")
.stream(&client.llm())
.await?;
let mut text = String::new();
while let Some(evt) = stream.next().await {
let evt = evt?;
if let Some(delta) = evt.text_delta {
text.push_str(&delta);
}
}
assert_eq!(text, "hello");
Ok::<(), modelrelay::Error>(())
}
Blocking non-streaming
use modelrelay::{fixtures, MockClient, MockConfig, Model, ProxyMessage, ProxyRequest};
#[test]
fn offline_blocking_completion() {
let client = MockClient::new(MockConfig::default().with_proxy_response(
fixtures::simple_proxy_response(),
));
let resp = client
.blocking_llm()
.proxy(
ProxyRequest::new(
Model::OpenAIGpt4oMini,
vec![ProxyMessage {
role: "user".into(),
content: "hi".into(),
}],
)
.unwrap(),
modelrelay::ProxyOptions::default(),
)
.unwrap();
assert_eq!(resp.content.join(""), "hello world");
}
Fixtures live under modelrelay::fixtures (basic completion, streaming events, frontend token, API keys). Mock streaming helpers preserve request/response IDs where provided.
Common provider/model IDs ship as enums (Model, Provider) with Other(String) fallbacks, and stop reasons use the typed StopReason enum so you can exhaustively match outcomes:
use modelrelay::{Model, Provider, ProxyMessage, ProxyOptions, ProxyRequest, StopReason};
let mut request = ProxyRequest::new(
Model::OpenAIGpt4oMini,
vec![ProxyMessage {
role: "user".into(),
content: "Explain enums in Rust".into(),
}],
)?; request.provider = Some(Provider::OpenAI);
let completion = client.llm().proxy(request, ProxyOptions::default()).await?;
match completion.stop_reason {
Some(StopReason::StopSequence) => eprintln!("hit explicit stop sequence"),
Some(StopReason::Other(reason)) => eprintln!("custom stop reason: {reason}"),
_ => {}
}
eprintln!("input tokens: {}, total tokens: {}", completion.usage.input(), completion.usage.total());
Async LLM proxy (non-streaming)
use modelrelay::{Client, Config, Model, ProxyMessage, ProxyRequest, ProxyOptions};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
..Default::default()
})?;
let request = ProxyRequest::new(
Model::OpenAIGpt4o,
vec![ProxyMessage {
role: "user".into(),
content: "Write a short greeting without code fences.".into(),
}],
)?;
let completion = client.llm().proxy(request, ProxyOptions::default()).await?;
println!(
"response {}: {} (stop={:?}, total_tokens={})",
completion.id,
completion.content.join(""),
completion.stop_reason,
completion.usage.total_tokens
);
Ok(())
}
Frontend token exchange (publishable key flow)
use modelrelay::{Client, Config, FrontendTokenRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let auth = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_PUBLISHABLE_KEY")?),
..Default::default()
})?;
let token = auth
.auth()
.frontend_token(FrontendTokenRequest {
user_id: "user-123".into(),
device_id: Some("device-abc".into()),
..Default::default()
})
.await?;
let client = Client::new(Config {
access_token: Some(token.token.clone()),
..Default::default()
})?;
Ok(())
}
API key management (server-side)
use modelrelay::{APIKeyCreateRequest, Client, Config};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::new(Config {
api_key: Some(std::env::var("MODELRELAY_API_KEY")?),
..Default::default()
})?;
let created = client
.api_keys()
.create(APIKeyCreateRequest {
label: "rust-sdk-demo".into(),
..Default::default()
})
.await?;
println!("created key {}", created.redacted_key);
let keys = client.api_keys().list().await?;
for key in keys {
println!("{} ({})", key.redacted_key, key.kind);
}
Ok(())
}
Errors
Error::Config — missing/invalid base URL, key/token, or bad headers.
Error::Transport — timeouts, DNS/TLS/connectivity (inspect TransportErrorKind and retries).
Error::Api — structured API errors (APIError includes status/code/fields/request_id and retry metadata).
Error::Serialization — JSON parsing/decoding failures.
Publishing
- Verify packaging before release:
cd sdk/rust && cargo package (or cargo publish --dry-run).
- Publish + subtree sync + signed tag on the public repo:
just sdk-release-rust 0.1.0 (needs crates.io token, signing key, and access to git@github.com:modelrelay/modelrelay-rs.git).
- Manual steps if you prefer:
cargo publish, git subtree push --prefix sdk/rust git@github.com:modelrelay/modelrelay-rs.git main, git fetch git@github.com:modelrelay/modelrelay-rs.git main:modelrelay-rs-main, git tag -s v0.1.0 modelrelay-rs-main -m "modelrelay v0.1.0", git push git@github.com:modelrelay/modelrelay-rs.git v0.1.0.
Environment variables
MODELRELAY_API_KEY — secret key for server-to-server calls.
MODELRELAY_PUBLISHABLE_KEY — publishable key for frontend token exchange.
MODELRELAY_BASE_URL — override API base URL (defaults to https://api.modelrelay.ai/api/v1).