nexo-llm
Multi-provider LLM client trait + concrete implementations for Anthropic / OpenAI-compat / MiniMax / Gemini / DeepSeek, with shared streaming pipeline, rate limiter, retry policies, and CircuitBreaker hardening.
This crate is part of Nexo — a multi-agent Rust framework with a NATS event bus, pluggable LLM providers (MiniMax, Anthropic, OpenAI-compat, Gemini, DeepSeek), per-agent credentials, MCP support, and channel plugins for WhatsApp, Telegram, Email, and Browser (CDP).
- Main repo: https://github.com/lordmacu/nexo-rs
- Runtime engine:
nexo-core - Public docs: https://lordmacu.github.io/nexo-rs/
What this crate does
LlmClienttrait — uniformchat,chat_stream,embedsurface across providers. New providers impl the trait + register viaLlmProviderFactory.- Concrete implementations:
AnthropicClient— native Anthropic API with prompt caching (cache_control), the OAuth subscription flow (Phase 15: anthropic-cli credential reader + browser PKCE), MarkdownV2-safe rendering.OpenAiCompatClient— OpenAI-compat for any endpoint that speaks/v1/chat/completions. Covers MiniMax / DeepSeek / Mistral.rs / Ollama / vLLM / LM Studio / TGI in one impl.GeminiClient— Google Generative Language API.MinimaxClient— native MiniMax API path with their OAuth flow.
- Shared streaming pipeline —
parse_openai_sse,parse_anthropic_sse,parse_gemini_sseproduce a uniformStream<Item = StreamChunk>.record_usage_tap+stream_metrics_tapinstrument every stream. - Rate limiter — per-provider token bucket so a bursty agent loop doesn't trigger upstream 429s.
- Retry policies —
with_retry(op, policy)with explicit rules per HTTP class:- 429 → 5 attempts, 1s → 60s exponential, honours
Retry-After. - 5xx → 3 attempts, 2s → 30s exponential.
- 4xx (other) → no retry.
- 429 → 5 attempts, 1s → 60s exponential, honours
- CircuitBreaker — per-
LlmClientinstance fromnexo-resiliencewraps every chat completion + token-counter call. - Tool calling — uniform
ToolDef+ tool-call serialisation across providers (handles the 4 different on-wire shapes). - Token counter —
TokenCounterfor budget estimation before sending; cascading provider used by Phase 18 context optimisation. - Streaming telemetry:
nexo_llm_stream_ttft_seconds_*{provider}histogram +nexo_llm_stream_chunks_total{provider,kind}counter.
Public API
Install
[]
= "0.1"
Documentation for this crate
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.