nexo-llm

Multi-provider LLM client trait + concrete implementations for Anthropic / OpenAI-compat / MiniMax / Gemini / DeepSeek, with shared streaming pipeline, rate limiter, retry policies, and CircuitBreaker hardening.

This crate is part of Nexo — a multi-agent Rust framework with a NATS event bus, pluggable LLM providers (MiniMax, Anthropic, OpenAI-compat, Gemini, DeepSeek), per-agent credentials, MCP support, and channel plugins for WhatsApp, Telegram, Email, and Browser (CDP).

Main repo: https://github.com/lordmacu/nexo-rs
Runtime engine: nexo-core
Public docs: https://lordmacu.github.io/nexo-rs/

What this crate does

LlmClient trait — uniform chat, chat_stream, embed surface across providers. New providers impl the trait + register via LlmProviderFactory.
Concrete implementations:
- AnthropicClient — native Anthropic API with prompt caching (cache_control), the OAuth subscription flow (Phase 15: anthropic-cli credential reader + browser PKCE), MarkdownV2-safe rendering.
- OpenAiCompatClient — OpenAI-compat for any endpoint that speaks /v1/chat/completions. Covers MiniMax / DeepSeek / Mistral.rs / Ollama / vLLM / LM Studio / TGI in one impl.
- GeminiClient — Google Generative Language API.
- MinimaxClient — native MiniMax API path with their OAuth flow.
Shared streaming pipeline — parse_openai_sse, parse_anthropic_sse, parse_gemini_sse produce a uniform Stream<Item = StreamChunk>. record_usage_tap + stream_metrics_tap instrument every stream.
Rate limiter — per-provider token bucket so a bursty agent loop doesn't trigger upstream 429s.
Retry policies — with_retry(op, policy) with explicit rules per HTTP class:
- 429 → 5 attempts, 1s → 60s exponential, honours Retry-After.
- 5xx → 3 attempts, 2s → 30s exponential.
- 4xx (other) → no retry.
CircuitBreaker — per-LlmClient instance from nexo-resilience wraps every chat completion + token-counter call.
Tool calling — uniform ToolDef + tool-call serialisation across providers (handles the 4 different on-wire shapes).
Token counter — TokenCounter for budget estimation before sending; cascading provider used by Phase 18 context optimisation.
Streaming telemetry: nexo_llm_stream_ttft_seconds_*{provider} histogram + nexo_llm_stream_chunks_total{provider,kind} counter.

Public API

#[async_trait]
pub trait LlmClient: Send + Sync {
    fn provider_name(&self) -> &'static str;
    async fn chat(&self, req: ChatRequest) -> Result<ChatResponse, LlmError>;
    async fn chat_stream(&self, req: ChatRequest) -> Result<BoxStream<'static, Result<StreamChunk, LlmError>>, LlmError>;
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>, LlmError>;
}

pub enum AnyLlmClient {
    Anthropic(AnthropicClient),
    OpenAiCompat(OpenAiCompatClient),
    Gemini(GeminiClient),
    Minimax(MinimaxClient),
}

Install

[dependencies]
nexo-llm = "0.1"

Documentation for this crate

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)