ollama-api-rs 0.3.0

An async Rust SDK for the Ollama API with OpenAI compatibility
Documentation

ollama-api-rs

A Rust SDK for the Ollama API with async support and OpenAI compatibility.

Crates.io Documentation License

Features

  • Async/await support
  • Easy client configuration with ModelClient::builder()
  • Streaming responses (chat and generation)
  • Full compatibility with Ollama API
  • OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings, /v1/responses)
  • Modular design with separate modules for chat, generate, embed, and model operations
  • Comprehensive error handling with custom error types
  • Convenience constructors: Message::user(), Message::assistant(), Message::system(), ChatMessage::user()
  • Complete API coverage including:
    • Chat completions with tool calling
    • Text generation
    • Embeddings (single and batch)
    • Model management (list, show, copy, delete, pull, push, create)
    • Model lifecycle (load/unload)
    • Blob management
    • Running models introspection

Installation

Add this to your Cargo.toml:

[dependencies]
ollama-api-rs = "0.3.0"

Then import it in your Rust code as:

use oai_sdk::{ModelClient, ChatRequest, Message};

For local-only features (blob management, model lifecycle, running models introspection):

[dependencies]
ollama-api-rs = { version = "0.3.0", features = ["local"] }

Authentication

For cloud access to ollama.com or private models, configure authentication:

let client = ModelClient::builder()
    .base_url("https://ollama.com")
    .auth_token("your-auth-token")
    .build()?;

OpenAI Compatibility

Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:

  • POST /v1/chat/completions - Chat completions
  • POST /v1/embeddings - Embeddings generation
  • POST /v1/responses - Response generation

Use base URL http://localhost:11434/v1/ with any API key:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

Usage

Basic Chat Completion

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Why is the sky blue?")],
        ..Default::default()
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}

Streaming Chat Responses

use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Write a short story about Rust.")],
        stream: true,
        ..Default::default()
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.message.content),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Text Generation

use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        ..Default::default()
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}

Streaming Text Generation

use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Write a haiku about Rust".to_string(),
        stream: true,
        ..Default::default()
    };

    let mut stream = client.generate_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Embeddings (Single)

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}

Batch Embeddings

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Multiple(vec![
            "Hello, world!".to_string(),
            "Goodbye, world!".to_string(),
        ]),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Batch embeddings: {:?}", response.embeddings);

    Ok(())
}

Legacy Embeddings

use oai_sdk::{ModelClient, EmbeddingsRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbeddingsRequest {
        model: "llama3:8b".to_string(),
        prompt: "Hello, world!".to_string(),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embeddings(request).await?;
    println!("Legacy embedding: {:?}", response.embedding);

    Ok(())
}

Tool Calling

use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("What is the weather in Tokyo?")],
        tools: Some(tools),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
            println!("Arguments: {}",
                serde_json::to_string_pretty(&tool_call.function.arguments)?);
        }
    }

    Ok(())
}

OpenAI-Compatible Chat

use oai_sdk::{ModelClient, ChatCompletionsRequest, ChatMessage};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatCompletionsRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![ChatMessage::user("Why is the sky blue?")],
        stream: Some(false),
        ..Default::default()
    };

    let response = client.chat_completions(request).await?;
    println!("{}", response.choices[0].message.content);

    Ok(())
}

Model Management

use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let models = client.list_models().await?;
    for model in models {
        println!("Model: {} ({})", model.name, model.details.parameter_size);
    }

    let request = ShowModelRequest {
        model: "llama3.1:8b".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    let copy_req = CopyModelRequest {
        source: "llama3.1:8b".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;
    println!("Model copied successfully");

    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;
    println!("Model deleted successfully");

    Ok(())
}

Model Lifecycle (Load/Unload)

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    client.load_model("llama3.1:8b").await?;
    println!("Model loaded into memory");

    client.unload_model("llama3.1:8b").await?;
    println!("Model unloaded from memory");

    Ok(())
}

Blob Management

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let digest = "sha256:abc123...";

    let exists = client.blob_exists(digest).await?;
    println!("Blob exists: {}", exists);

    let content = b"model blob content";
    client.push_blob(digest, content).await?;
    println!("Blob pushed successfully");

    Ok(())
}

API Coverage

Ollama API Endpoint SDK Method Module Feature Required
POST /api/chat chat(), chat_stream() chat default
POST /api/generate generate(), generate_stream() generate default
POST /api/embed embed() embed default
POST /api/embeddings embeddings() embed default
GET /api/tags list_models() model default
POST /api/show show_model() model default
POST /api/copy copy_model() model default
DELETE /api/delete delete_model() model default
POST /api/pull pull_model() model default
POST /api/push push_model() model default
POST /api/create create_model() model default
GET /api/ps list_running_models() model local
GET /api/version get_version() client default
HEAD /api/blobs/:digest blob_exists() client local
POST /api/blobs/:digest push_blob() client local
POST /v1/chat/completions chat_completions() openai default
POST /v1/embeddings openai_embeddings() openai default
POST /v1/responses responses() openai default

Model Lifecycle (requires local feature)

The following methods are available when the local feature is enabled:

  • load_model() / unload_model() - Load/unload models into memory

Modules

The crate is organized into the following modules:

  • chat - Chat completion functionality (with streaming and tool support)
  • generate - Text generation functionality (with streaming support)
  • embed - Embeddings functionality (single and batch)
  • model - Model management functionality (CRUD, pull, push)
  • openai - OpenAI-compatible endpoints (chat, embeddings, responses)
  • client - Core client functionality, blob management, and model lifecycle
  • error - Error types and handling

Examples

See the examples directory for more comprehensive examples:

  • basic_chat.rs - Simple chat interface
  • streaming_chat.rs - Streaming chat responses
  • embeddings.rs - Generating embeddings with the modern API
  • model_management.rs - Managing models (list, show, copy, delete)
  • model_lifecycle.rs - Loading and unloading models into memory (requires local)
  • tool_calling.rs - Using tool calling functionality
  • openai_compatibility.rs - Using OpenAI-compatible endpoints

Testing

Run the tests with:

cargo test

The tests include both integration tests that require a running Ollama instance and mock tests that don't.

For E2E tests against a real Ollama instance:

cargo test --test e2e_test -- --ignored

License

Apache 2.0

Author

Victor Palade victor@cloudflavor.io

Website: https://cloudflavor.io