LLM

Note: This crate name previously belonged to another project. The current implementation represents a new and different library. The previous crate is now archived and will not receive any updates. ref: https://github.com/rustformers/llm

LLM is a Rust library that lets you use multiple LLM backends in a single project: OpenAI, Anthropic (Claude), Ollama, DeepSeek, xAI, Phind, Groq, Google, Cohere, Mistral and ElevenLabs. With a unified API and builder style - similar to the Stripe experience - you can easily create chat, text completion, speak-to-text requests without multiplying structures and crates.

Key Features

Multi-backend: Manage OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, OpenRouter, Cohere, Elevenlabs and Google through a single entry point.
Multi-step chains: Create multi-step chains with different backends at each step.
Templates: Use templates to create complex prompts with variables.
Builder pattern: Configure your LLM (model, temperature, max_tokens, timeouts...) with a few simple calls.
Chat & Completions: Two unified traits (ChatProvider and CompletionProvider) to cover most use cases.
Extensible: Easily add new backends.
Rust-friendly: Designed with clear traits, unified error handling, and conditional compilation via features.
Validation: Add validation to your requests to ensure the output is what you expect.
Resilience (retry/backoff): Enable resilient calls with exponential backoff and jitter.
Evaluation: Add evaluation to your requests to score the output of LLMs.
Parallel Evaluation: Evaluate multiple LLM providers in parallel and select the best response based on scoring functions.
Function calling: Add function calling to your requests to use tools in your LLMs.
REST API: Serve any LLM backend as a REST API with openai standard format.
Vision: Add vision to your requests to use images in your LLMs.
Reasoning: Add reasoning to your requests to use reasoning in your LLMs.
Structured Output: Request structured output from certain LLM providers based on a provided JSON schema.
Speech to text: Transcribe audio to text
Text to speech: Transcribe text to audio
Memory: Store and retrieve conversation history with sliding window (soon others) and shared memory support
Agentic: Build reactive agents that can cooperate via shared memory, with configurable triggers, roles and validation.

Use any LLM backend on your project

Simply add LLM to your Cargo.toml:

[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "mistral", "Elevenlabs"] }

Use any LLM on cli

LLM includes a command-line tool for easily interacting with different LLM models. You can install it with: cargo install llm

Use llm to start an interactive chat session
Use llm openai:gpt-4o to start an interactive chat session with provider:model
Use llm set OPENAI_API_KEY your_key to configure your API key
Use llm default openai:gpt-4 to set a default provider
Use echo "Hello World" | llm to pipe
Use llm --provider openai --model gpt-4 --temperature 0.7 for advanced options

Serving any LLM backend as a REST API

Use standard messages format
Use step chains to chain multiple LLM backends together
Expose the chain through a REST API with openai standard format

[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "api", "mistral", "elevenlabs"] }

More details in the api_example

More examples

Name	Description
`anthropic_example`	Demonstrates integration with Anthropic's Claude model for chat completion
`anthropic_streaming_example`	Anthropic streaming chat example demonstrating real-time token generation
`chain_example`	Shows how to create multi-step prompt chains for exploring programming language features
`deepseek_example`	Basic DeepSeek chat completion example with deepseek-chat models
`embedding_example`	Basic embedding example with OpenAI's API
`multi_backend_example`	Illustrates chaining multiple LLM backends (OpenAI, Anthropic, DeepSeek) together in a single workflow
`ollama_example`	Example of using local LLMs through Ollama integration
`openai_example`	Basic OpenAI chat completion example with GPT models
`resilient_example`	Simple retry/backoff wrapper usage
`openai_streaming_example`	OpenAI streaming chat example demonstrating real-time token generation
`phind_example`	Basic Phind chat completion example with Phind-70B model
`validator_example`	Basic validator example with Anthropic's Claude model
`xai_example`	Basic xAI chat completion example with Grok models
`xai_streaming_example`	X.AI streaming chat example demonstrating real-time token generation
`evaluation_example`	Basic evaluation example with Anthropic, Phind and DeepSeek
`evaluator_parallel_example`	Evaluate multiple LLM providers in parallel
`google_example`	Basic Google Gemini chat completion example with Gemini models
`google_streaming_example`	Google streaming chat example demonstrating real-time token generation
`google_pdf`	Google Gemini chat with PDF attachment
`google_image`	Google Gemini chat with PDF attachment
`google_embedding_example`	Basic Google Gemini embedding example with Gemini models
`tool_calling_example`	Basic tool calling example with OpenAI
`google_tool_calling_example`	Google Gemini function calling example with complex JSON schema for meeting scheduling
`json_schema_nested_example`	Advanced example demonstrating deeply nested JSON schemas with arrays of objects and complex data structures
`tool_json_schema_cycle_example`	Complete tool calling cycle with JSON schema validation and structured responses
`unified_tool_calling_example`	Unified tool calling with selectable provider - demonstrates multi-turn tool use and tool choice
`deepclaude_pipeline_example`	Basic deepclaude pipeline example with DeepSeek and Claude
`api_example`	Basic API (openai standard format) example with OpenAI, Anthropic, DeepSeek and Groq
`api_deepclaude_example`	Basic API (openai standard format) example with DeepSeek and Claude
`anthropic_vision_example`	Basic anthropic vision example with Anthropic
`openai_vision_example`	Basic openai vision example with OpenAI
`openai_reasoning_example`	Basic openai reasoning example with OpenAI
`anthropic_thinking_example`	Anthropic reasoning example
`elevenlabs_stt_example`	Speech-to-text transcription example using ElevenLabs
`elevenlabs_tts_example`	Text-to-speech example using ElevenLabs
`openai_stt_example`	Speech-to-text transcription example using OpenAI
`openai_tts_example`	Text-to-speech example using OpenAI
`tts_rodio_example`	Text-to-speech with rodio example using OpenAI
`chain_audio_text_example`	Example demonstrating a multi-step chain combining speech-to-text and text processing
`xai_search_chain_tts_example`	Example demonstrating a multi-step chain combining XAI search, OpenAI summarization, and ElevenLabs text-to-speech with Rodio playback
`xai_search_example`	Example demonstrating X.AI search functionality with search modes, date ranges, and source filtering
`memory_example`	Automatic memory integration - LLM remembers conversation context across calls
`memory_share_example`	Example demonstrating shared memory between multiple LLM providers
`trim_strategy_example`	Example demonstrating memory trimming strategies with automatic summarization
`agent_builder_example`	Example of reactive agents cooperating via shared memory, demonstrating creation of LLM agents with roles, conditions
`openai_web_search_example`	Example demonstrating OpenAI web search functionality with location-based search context
`model_listing_example`	Example demonstrating how to list available models from an LLM backend
`cohere_example`	Basic Cohere chat completion example with Command models
`mistral_example`	Basic Mistral example with Mistral models

Usage

Here's a basic example using OpenAI for chat completion. See the examples directory for other backends (Anthropic, Ollama, DeepSeek, xAI, Google, Phind, Elevenlabs), embedding capabilities, and more advanced use cases.

use llm::{
    builder::{LLMBackend, LLMBuilder}, // Builder pattern components
    chat::ChatMessage,                 // Chat-related structures
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Get OpenAI API key from environment variable or use test key as fallback
    let api_key = std::env::var("OPENAI_API_KEY").unwrap_or("sk-TESTKEY".into());

    // Initialize and configure the LLM client
    let llm = LLMBuilder::new()
        .backend(LLMBackend::OpenAI)	// Use OpenAI as the LLM provider
        .api_key(api_key) 						// Set the API key
        .model("gpt-4.1-nano") 				// Use GPT-4.1 Nano model
        .max_tokens(512) 							// Limit response length
        .temperature(0.7) 						// Control response randomness (0.0-1.0)
        .build()
        .expect("Failed to build LLM");

    // Prepare conversation history with example messages
    let messages = vec![
        ChatMessage::user()
            .content("Tell me that you love cats")
            .build(),
        ChatMessage::assistant()
            .content("I am an assistant, I cannot love cats but I can love dogs")
            .build(),
        ChatMessage::user()
            .content("Tell me that you love dogs in 2000 chars")
            .build(),
    ];

    // Send chat request and handle the response
    match llm.chat(&messages).await {
        Ok(response) => {
            // Print the response text
            if let Some(text) = response.text() {
                println!("Response: {text}");
            }
            // Print usage information
            if let Some(usage) = response.usage() {
                println!("  Prompt tokens: {}", usage.prompt_tokens);
                println!("  Completion tokens: {}", usage.completion_tokens);
            } else {
                println!("No usage information available");
            }
        }
        Err(e) => eprintln!("Chat error: {e}"),
    }
    Ok(())
}