Crate limit_llm

Expand description

§limit-llm

Multi-provider LLM client for Rust with streaming support.

Unified API for Anthropic Claude, OpenAI, z.ai, and local LLMs with built-in token tracking, state persistence, and automatic model handoff.

§Features

Multi-provider support: Anthropic Claude, OpenAI GPT, z.ai GLM, and local LLMs
Streaming responses: Async streaming with futures::Stream
Token tracking: SQLite-based usage tracking and cost estimation
State persistence: Serialize/restore conversation state with bincode
Model handoff: Automatic fallback between providers on failure
Tool calling: Full function/tool support for all compatible providers
Thinking mode: Extended reasoning support (Claude, z.ai)

§Quick Start

use limit_llm::{AnthropicClient, Message, Role, LlmProvider};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client from environment variable ANTHROPIC_API_KEY
    let client = AnthropicClient::new(
        std::env::var("ANTHROPIC_API_KEY")?,
        None,  // default base URL
        60,    // timeout in seconds
        "claude-sonnet-4-6-20260217",
        4096,  // max tokens
    );

    let messages = vec![
        Message {
            role: Role::User,
            content: Some("Hello, Claude!".to_string()),
            tool_calls: None,
            tool_call_id: None,
        }
    ];

    // Stream the response
    let mut stream = client.send(messages, vec![]).await;
     
    while let Some(chunk) = stream.next().await {
        match chunk {
            Ok(limit_llm::ProviderResponseChunk::ContentDelta(text)) => {
                print!("{}", text);
            }
            Ok(limit_llm::ProviderResponseChunk::Done(usage)) => {
                println!("\n\nTokens: {} in, {} out",
                    usage.input_tokens, usage.output_tokens);
            }
            Err(e) => eprintln!("Error: {}", e),
            _ => {}
        }
    }

    Ok(())
}

§Providers

Provider	Client	Streaming	Tools	Thinking
Anthropic Claude	`AnthropicClient`	✓	✓	✓
OpenAI	`OpenAiProvider`	✓	✓	—
z.ai GLM	`ZaiProvider`	✓	✓	✓
Local/Ollama	`LocalProvider`	✓	—	—

§Configuration

§Environment Variables

ANTHROPIC_API_KEY=your-key      # For Claude
OPENAI_API_KEY=your-key          # For GPT
ZAI_API_KEY=your-key             # For z.ai

§Programmatic Configuration

use limit_llm::{Config, ProviderFactory};

// Load from ~/.limit/config.toml
let config = Config::load()?;

// Create provider from config
let provider = ProviderFactory::create_provider(&config)?;

§Token Tracking

use limit_llm::TrackingDb;

let tracking = TrackingDb::new()?;

// Track a request
tracking.track_request(
    "claude-sonnet-4-6-20260217",
    100,  // input tokens
    50,   // output tokens
    0.001, // cost in USD
    1500,  // duration in ms
)?;

// Get statistics for last 7 days
let stats = tracking.get_usage_stats(7)?;
println!("Total cost: ${:.4}", stats.total_cost);

§State Persistence

use limit_llm::{StatePersistence, Message};

let persistence = StatePersistence::new("~/.limit/state/session.bin");

// Save conversation
let messages: Vec<Message> = vec![];
persistence.save(&messages)?;

// Restore later
let restored = persistence.load()?;

§Model Handoff

The ModelHandoff type provides token counting and message compaction for transitioning between models with different context windows:

use limit_llm::ModelHandoff;

let handoff = ModelHandoff::new();

// Count tokens in a message
let tokens = handoff.count_tokens("Hello, world!");
println!("Token count: {}", tokens);

// Compact messages to fit a target context window
// let compacted = handoff.handoff_to_model("claude-3-5-sonnet", "claude-3-5-haiku", &messages);

Re-exports§

pub use client::AnthropicClient;
pub use config::BrowserConfigSection;
pub use config::Config;
pub use config::ProviderConfig;
pub use error::LlmError;
pub use handoff::ModelHandoff;
pub use local_provider::LocalProvider;
pub use openai_provider::OpenAiProvider;
pub use persistence::StatePersistence;
pub use provider_factory::ProviderFactory;
pub use providers::LlmProvider;
pub use providers::ProviderResponseChunk;
pub use tracking::TrackingDb;
pub use types::FunctionCall;
pub use types::Message;
pub use types::Response;
pub use types::Role;
pub use types::Tool;
pub use types::ToolCall;
pub use types::ToolFunction;
pub use types::Usage;
pub use zai_provider::ThinkingConfig;
pub use zai_provider::ZaiProvider;

Modules§

client
config
error
handoff
local_provider
openai_provider
persistence
provider_factory
providers: Multi-provider LLM support.
tracking
types: Core types for LLM message passing and tool definitions.
zai_provider

Crate limit_llm

Crate limit_llm Copy item path

§limit-llm

§Features

§Quick Start

§Providers

§Configuration

§Environment Variables

§Programmatic Configuration

§Token Tracking

§State Persistence

§Model Handoff

Re-exports§

Modules§

Crate limit_llm