Skip to main content

Crate limit_llm

Crate limit_llm 

Source
Expand description

§limit-llm

Crates.io Docs.rs License: MIT

Multi-provider LLM client for Rust with streaming support.

Unified API for Anthropic Claude, OpenAI, z.ai, and local LLMs with built-in token tracking, state persistence, and automatic model handoff.

§Features

  • Multi-provider support: Anthropic Claude, OpenAI GPT, z.ai GLM, and local LLMs
  • Streaming responses: Async streaming with futures::Stream
  • Token tracking: SQLite-based usage tracking and cost estimation
  • State persistence: Serialize/restore conversation state with bincode
  • Model handoff: Automatic fallback between providers on failure
  • Tool calling: Full function/tool support for all compatible providers
  • Thinking mode: Extended reasoning support (Claude, z.ai)

§Quick Start

use limit_llm::{AnthropicClient, Message, Role, LlmProvider};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client from environment variable ANTHROPIC_API_KEY
    let client = AnthropicClient::new(
        std::env::var("ANTHROPIC_API_KEY")?,
        None,  // default base URL
        60,    // timeout in seconds
        "claude-sonnet-4-6-20260217",
        4096,  // max tokens
    );

    let messages = vec![
        Message {
            role: Role::User,
            content: Some("Hello, Claude!".to_string()),
            tool_calls: None,
            tool_call_id: None,
        }
    ];

    // Stream the response
    let mut stream = client.send(messages, vec![]).await;
     
    while let Some(chunk) = stream.next().await {
        match chunk {
            Ok(limit_llm::ProviderResponseChunk::ContentDelta(text)) => {
                print!("{}", text);
            }
            Ok(limit_llm::ProviderResponseChunk::Done(usage)) => {
                println!("\n\nTokens: {} in, {} out",
                    usage.input_tokens, usage.output_tokens);
            }
            Err(e) => eprintln!("Error: {}", e),
            _ => {}
        }
    }

    Ok(())
}

§Providers

ProviderClientStreamingToolsThinking
Anthropic ClaudeAnthropicClient
OpenAIOpenAiProvider
z.ai GLMZaiProvider
Local/OllamaLocalProvider

§Configuration

§Environment Variables

ANTHROPIC_API_KEY=your-key      # For Claude
OPENAI_API_KEY=your-key          # For GPT
ZAI_API_KEY=your-key             # For z.ai

§Programmatic Configuration

use limit_llm::{Config, ProviderFactory};

// Load from ~/.limit/config.toml
let config = Config::load()?;

// Create provider from config
let provider = ProviderFactory::create_provider(&config)?;

§Token Tracking

use limit_llm::TrackingDb;

let tracking = TrackingDb::new()?;

// Track a request
tracking.track_request(
    "claude-sonnet-4-6-20260217",
    100,  // input tokens
    50,   // output tokens
    0.001, // cost in USD
    1500,  // duration in ms
)?;

// Get statistics for last 7 days
let stats = tracking.get_usage_stats(7)?;
println!("Total cost: ${:.4}", stats.total_cost);

§State Persistence

use limit_llm::{StatePersistence, Message};

let persistence = StatePersistence::new("~/.limit/state/session.bin");

// Save conversation
let messages: Vec<Message> = vec![];
persistence.save(&messages)?;

// Restore later
let restored = persistence.load()?;

§Model Handoff

The ModelHandoff type provides token counting and message compaction for transitioning between models with different context windows:

use limit_llm::ModelHandoff;

let handoff = ModelHandoff::new();

// Count tokens in a message
let tokens = handoff.count_tokens("Hello, world!");
println!("Token count: {}", tokens);

// Compact messages to fit a target context window
// let compacted = handoff.handoff_to_model("claude-3-5-sonnet", "claude-3-5-haiku", &messages);

Re-exports§

pub use client::AnthropicClient;
pub use config::BrowserConfigSection;
pub use config::Config;
pub use config::ProviderConfig;
pub use error::LlmError;
pub use handoff::ModelHandoff;
pub use local_provider::LocalProvider;
pub use openai_provider::OpenAiProvider;
pub use persistence::StatePersistence;
pub use provider_factory::ProviderFactory;
pub use providers::LlmProvider;
pub use providers::ProviderResponseChunk;
pub use tracking::TrackingDb;
pub use types::FunctionCall;
pub use types::Message;
pub use types::Response;
pub use types::Role;
pub use types::Tool;
pub use types::ToolCall;
pub use types::ToolFunction;
pub use types::Usage;
pub use zai_provider::ThinkingConfig;
pub use zai_provider::ZaiProvider;

Modules§

client
config
error
handoff
local_provider
openai_provider
persistence
provider_factory
providers
Multi-provider LLM support.
tracking
types
Core types for LLM message passing and tool definitions.
zai_provider