nexo-llm 0.1.6

LLM provider clients (MiniMax, OpenAI-compat, Anthropic, Gemini) with rate limiter and tool registry.
Documentation
# nexo-llm

> Multi-provider LLM client trait + concrete implementations for Anthropic / OpenAI-compat / MiniMax / Gemini / DeepSeek, with shared streaming pipeline, rate limiter, retry policies, and CircuitBreaker hardening.

This crate is part of **[Nexo](https://github.com/lordmacu/nexo-rs)** — a multi-agent Rust framework with a NATS event bus, pluggable LLM providers (MiniMax, Anthropic, OpenAI-compat, Gemini, DeepSeek), per-agent credentials, MCP support, and channel plugins for WhatsApp, Telegram, Email, and Browser (CDP).

- **Main repo:** <https://github.com/lordmacu/nexo-rs>
- **Runtime engine:** [`nexo-core`]https://github.com/lordmacu/nexo-rs/tree/main/crates/core
- **Public docs:** <https://lordmacu.github.io/nexo-rs/>

## What this crate does

- **`LlmClient` trait** — uniform `chat`, `chat_stream`,
  `embed` surface across providers. New providers impl the
  trait + register via `LlmProviderFactory`.
- **Concrete implementations**:
  - `AnthropicClient` — native Anthropic API with prompt
    caching (`cache_control`), the OAuth subscription flow
    (Phase 15: anthropic-cli credential reader + browser
    PKCE), MarkdownV2-safe rendering.
  - `OpenAiCompatClient` — OpenAI-compat for any endpoint that
    speaks `/v1/chat/completions`. Covers MiniMax / DeepSeek /
    Mistral.rs / Ollama / vLLM / LM Studio / TGI in one impl.
  - `GeminiClient` — Google Generative Language API.
  - `MinimaxClient` — native MiniMax API path with their
    OAuth flow.
- **Shared streaming pipeline**`parse_openai_sse`,
  `parse_anthropic_sse`, `parse_gemini_sse` produce a uniform
  `Stream<Item = StreamChunk>`. `record_usage_tap` +
  `stream_metrics_tap` instrument every stream.
- **Rate limiter** — per-provider token bucket so a
  bursty agent loop doesn't trigger upstream 429s.
- **Retry policies**`with_retry(op, policy)` with explicit
  rules per HTTP class:
  - 429 → 5 attempts, 1s → 60s exponential, honours
    `Retry-After`.
  - 5xx → 3 attempts, 2s → 30s exponential.
  - 4xx (other) → no retry.
- **CircuitBreaker** — per-`LlmClient` instance from
  [`nexo-resilience`]https://github.com/lordmacu/nexo-rs/tree/main/crates/resilience
  wraps every chat completion + token-counter call.
- **Tool calling** — uniform `ToolDef` + tool-call
  serialisation across providers (handles the 4 different
  on-wire shapes).
- **Token counter**`TokenCounter` for budget estimation
  before sending; cascading provider used by Phase 18
  context optimisation.
- **Streaming telemetry**:
  `nexo_llm_stream_ttft_seconds_*{provider}` histogram +
  `nexo_llm_stream_chunks_total{provider,kind}` counter.

## Public API

```rust
#[async_trait]
pub trait LlmClient: Send + Sync {
    fn provider_name(&self) -> &'static str;
    async fn chat(&self, req: ChatRequest) -> Result<ChatResponse, LlmError>;
    async fn chat_stream(&self, req: ChatRequest) -> Result<BoxStream<'static, Result<StreamChunk, LlmError>>, LlmError>;
    async fn embed(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>, LlmError>;
}

pub enum AnyLlmClient {
    Anthropic(AnthropicClient),
    OpenAiCompat(OpenAiCompatClient),
    Gemini(GeminiClient),
    Minimax(MinimaxClient),
}
```

## Install

```toml
[dependencies]
nexo-llm = "0.1"
```

## Documentation for this crate

- [MiniMax M2.5]https://lordmacu.github.io/nexo-rs/llm/minimax.html
- [Anthropic / Claude]https://lordmacu.github.io/nexo-rs/llm/anthropic.html
- [OpenAI-compatible]https://lordmacu.github.io/nexo-rs/llm/openai.html
- [DeepSeek]https://lordmacu.github.io/nexo-rs/llm/deepseek.html
- [Rate limiting & retry]https://lordmacu.github.io/nexo-rs/llm/retry.html

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]https://github.com/lordmacu/nexo-rs/blob/main/LICENSE-APACHE)
- MIT license ([LICENSE-MIT]https://github.com/lordmacu/nexo-rs/blob/main/LICENSE-MIT)

at your option.