ollama-api-rs 0.2.0

An async Rust SDK for the Ollama API with OpenAI compatibility
Documentation

ollama-api-rs

A Rust SDK for the Ollama API with async support and OpenAI compatibility.

Crates.io Documentation License

Features

  • Async/await support
  • Easy client configuration with ModelClient::builder()
  • Streaming responses (chat and generation)
  • Full compatibility with Ollama API
  • OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings, /v1/responses)
  • Modular design with separate modules for chat, generate, embed, and model operations
  • Comprehensive error handling with custom error types
  • Complete API coverage including:
    • Chat completions with tool calling
    • Text generation
    • Embeddings (single and batch)
    • Model management (list, show, copy, delete, pull, push, create)
    • Model lifecycle (load/unload)
    • Blob management
    • Running models introspection

Installation

Add this to your Cargo.toml:

[dependencies]
ollama-api-rs = "0.2.0"

Then import it in your Rust code as:

use oai_sdk::{ModelClient, ChatRequest, Message};

For local-only features (blob management, model lifecycle, running models introspection):

[dependencies]
ollama-api-rs = { version = "0.2.0", features = ["local"] }

Authentication

For cloud access to ollama.com or private models, configure authentication:

let client = ModelClient::builder()
    .base_url("https://ollama.com".to_string())
    .auth_token("your-auth-token".to_string())
    .build()?;

OpenAI Compatibility

Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:

  • POST /v1/chat/completions - Chat completions
  • POST /v1/embeddings - Embeddings generation
  • POST /v1/responses - Response generation

Use base URL http://localhost:11434/v1/ with any API key:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

Usage

Basic Chat Completion

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "Why is the sky blue?".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: false,
        format: None,
        options: None,
        keep_alive: None,
        tools: None,
        think: None,
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}

Streaming Chat Responses

use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "Write a short story about Rust.".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: true,
        format: None,
        options: None,
        keep_alive: None,
        tools: None,
        think: None,
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => {
                print!("{}", response.message.content);
                std::io::stdout().flush()?;
            }
            Err(e) => {
                eprintln!("Error: {}", e);
                break;
            }
        }
    }

    Ok(())
}

Text Generation

use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        stream: false,
        suffix: None,
        images: None,
        format: None,
        options: None,
        system: None,
        template: None,
        raw: None,
        keep_alive: None,
        context: None,
        think: None,
        width: None,
        height: None,
        steps: None,
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}

Streaming Text Generation

use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Write a haiku about Rust".to_string(),
        stream: true,
        ..Default::default()
    };

    let mut stream = client.generate_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Embeddings (Single)

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        options: None,
        keep_alive: None,
        dimensions: None,
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}

Batch Embeddings

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Multiple(vec![
            "Hello, world!".to_string(),
            "Goodbye, world!".to_string(),
        ]),
        truncate: Some(true),
        options: None,
        keep_alive: None,
        dimensions: None,
    };

    let response = client.embed(request).await?;
    println!("Batch embeddings: {:?}", response.embeddings);

    Ok(())
}

Legacy Embeddings

use oai_sdk::{ModelClient, EmbeddingsRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbeddingsRequest {
        model: "llama3:8b".to_string(),
        prompt: "Hello, world!".to_string(),
        truncate: Some(true),
        options: None,
        keep_alive: None,
    };

    let response = client.embeddings(request).await?;
    println!("Legacy embedding: {:?}", response.embedding);

    Ok(())
}

Tool Calling

use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "What is the weather in Tokyo?".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: false,
        format: None,
        options: None,
        keep_alive: None,
        tools: Some(tools),
        think: None,
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
            println!("Arguments: {}", 
                serde_json::to_string_pretty(&tool_call.function.arguments)?);
        }
    }

    Ok(())
}

Model Management

use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    // List all models
    let models = client.list_models().await?;
    for model in models {
        println!("Model: {} ({})", model.name, model.details.parameter_size);
    }

    // List currently running models
    let running = client.list_running_models().await?;
    for model in running {
        println!("Running: {} expires at {}", model.name, model.expires_at);
    }

    // Show detailed model information
    let request = ShowModelRequest {
        model: "llama3.1:8b".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    // Copy a model
    let copy_req = CopyModelRequest {
        source: "llama3.1:8b".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;
    println!("Model copied successfully");

    // Delete a model
    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;
    println!("Model deleted successfully");

    Ok(())
}

Model Lifecycle (Load/Unload)

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    // Load model into memory
    let _response = client.load_model("llama3.1:8b").await?;
    println!("Model loaded into memory");

    // Unload model from memory
    let _response = client.unload_model("llama3.1:8b").await?;
    println!("Model unloaded from memory");

    Ok(())
}

Blob Management

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let digest = "sha256:abc123...";
    
    // Check if blob exists
    let exists = client.blob_exists(digest).await?;
    println!("Blob exists: {}", exists);

    // Push blob content
    let content = b"model blob content";
    client.push_blob(digest, content).await?;
    println!("Blob pushed successfully");

    Ok(())
}

API Coverage

Ollama API Endpoint SDK Method Module Feature Required
POST /api/chat chat(), chat_stream() chat default
POST /api/generate generate(), generate_stream() generate default
POST /api/embed embed() embed default
POST /api/embeddings embeddings() embed default
GET /api/tags list_models() model default
POST /api/show show_model() model default
POST /api/copy copy_model() model default
DELETE /api/delete delete_model() model default
POST /api/pull pull_model() model default
POST /api/push push_model() model default
POST /api/create create_model() model default
GET /api/ps list_running_models() model local
GET /api/version get_version() client default
HEAD /api/blobs/:digest blob_exists() client local
POST /api/blobs/:digest push_blob() client local
POST /v1/chat/completions chat_completions() openai default
POST /v1/embeddings openai_embeddings() openai default
POST /v1/responses responses() openai default

Model Lifecycle (requires local feature)

The following methods are available when the local feature is enabled:

  • load_model() / unload_model() - Load/unload models into memory

Modules

The crate is organized into the following modules:

  • chat - Chat completion functionality (with streaming and tool support)
  • generate - Text generation functionality (with streaming support)
  • embed - Embeddings functionality (single and batch)
  • model - Model management functionality (CRUD, pull, push)
  • openai - OpenAI-compatible endpoints (chat, embeddings, responses)
  • client - Core client functionality, blob management, and model lifecycle
  • error - Error types and handling

Examples

See the examples directory for more comprehensive examples:

  • basic_chat.rs - Simple chat interface
  • streaming_chat.rs - Streaming chat responses
  • embeddings.rs - Generating embeddings with the modern API
  • legacy_embeddings.rs - Using the legacy embeddings endpoint
  • model_management.rs - Managing models (list, show, copy, delete)
  • model_lifecycle.rs - Loading and unloading models into memory (requires local)
  • blob_management.rs - Checking and pushing blob data (requires local)
  • tool_calling.rs - Using tool calling functionality
  • openai_compatibility.rs - Using OpenAI-compatible endpoints

Testing

Run the tests with:

cargo test

The tests include both integration tests that require a running Ollama instance and mock tests that don't.

For E2E tests against a real Ollama instance:

cargo test --test e2e_test -- --ignored

License

Apache 2.0

Author

Victor Palade victor@cloudflavor.io

Website: https://cloudflavor.io