ollama-api-rs

A Rust SDK for the Ollama API with async support and OpenAI compatibility.

Features

Async/await support
Easy client configuration with ModelClient::builder()
Streaming responses (chat and generation)
Full compatibility with Ollama API
OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings, /v1/responses)
Modular design with separate modules for chat, generate, embed, and model operations
Comprehensive error handling with custom error types
Complete API coverage including:
- Chat completions with tool calling
- Text generation
- Embeddings (single and batch)
- Model management (list, show, copy, delete, pull, push, create)
- Model lifecycle (load/unload)
- Blob management
- Running models introspection

Installation

Add this to your Cargo.toml:

[dependencies]
ollama-api-rs = "0.1.0"

Then import it in your Rust code as:

use oai_sdk::{ModelClient, ChatRequest, Message};

For local-only features (blob management, model lifecycle, running models introspection):

[dependencies]
ollama-api-rs = { version = "0.1.0", features = ["local"] }

Authentication

For cloud access to ollama.com or private models, configure authentication:

let client = ModelClient::builder()
    .base_url("https://ollama.com".to_string())
    .auth_token("your-auth-token".to_string())
    .build()?;

OpenAI Compatibility

Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:

POST /v1/chat/completions - Chat completions
POST /v1/embeddings - Embeddings generation
POST /v1/responses - Response generation

Use base URL http://localhost:11434/v1/ with any API key:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

Usage

Basic Chat Completion

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = ChatRequest {
        model: "llama3".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "Why is the sky blue?".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: false,
        format: None,
        options: None,
        keep_alive: None,
        tools: None,
        think: None,
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}

Streaming Chat Responses

use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = ChatRequest {
        model: "llama3".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "Write a short story about Rust.".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: true,
        format: None,
        options: None,
        keep_alive: None,
        tools: None,
        think: None,
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => {
                print!("{}", response.message.content);
                std::io::stdout().flush()?;
            }
            Err(e) => {
                eprintln!("Error: {}", e);
                break;
            }
        }
    }

    Ok(())
}

Text Generation

use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = GenerateRequest {
        model: "llama3".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        stream: false,
        suffix: None,
        images: None,
        format: None,
        options: None,
        system: None,
        template: None,
        raw: None,
        keep_alive: None,
        context: None,
        think: None,
        width: None,
        height: None,
        steps: None,
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}

Streaming Text Generation

use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = GenerateRequest {
        model: "llama3".to_string(),
        prompt: "Write a haiku about Rust".to_string(),
        stream: true,
        ..Default::default()
    };

    let mut stream = client.generate_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

Embeddings (Single)

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        options: None,
        keep_alive: None,
        dimensions: None,
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}

Batch Embeddings

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Multiple(vec![
            "Hello, world!".to_string(),
            "Goodbye, world!".to_string(),
        ]),
        truncate: Some(true),
        options: None,
        keep_alive: None,
        dimensions: None,
    };

    let response = client.embed(request).await?;
    println!("Batch embeddings: {:?}", response.embeddings);

    Ok(())
}

Legacy Embeddings

use oai_sdk::{ModelClient, EmbeddingsRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let request = EmbeddingsRequest {
        model: "llama3:8b".to_string(),
        prompt: "Hello, world!".to_string(),
        truncate: Some(true),
        options: None,
        keep_alive: None,
    };

    let response = client.embeddings(request).await?;
    println!("Legacy embedding: {:?}", response.embedding);

    Ok(())
}

Tool Calling

use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3".to_string(),
        messages: vec![
            Message {
                role: "user".to_string(),
                content: "What is the weather in Tokyo?".to_string(),
                images: None,
                tool_calls: None,
                tool_name: None,
                thinking: None,
            }
        ],
        stream: false,
        format: None,
        options: None,
        keep_alive: None,
        tools: Some(tools),
        think: None,
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
            println!("Arguments: {}", 
                serde_json::to_string_pretty(&tool_call.function.arguments)?);
        }
    }

    Ok(())
}

Model Management

use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    // List all models
    let models = client.list_models().await?;
    for model in models {
        println!("Model: {} ({})", model.name, model.details.parameter_size);
    }

    // List currently running models
    let running = client.list_running_models().await?;
    for model in running {
        println!("Running: {} expires at {}", model.name, model.expires_at);
    }

    // Show detailed model information
    let request = ShowModelRequest {
        model: "llama3".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    // Copy a model
    let copy_req = CopyModelRequest {
        source: "llama3".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;
    println!("Model copied successfully");

    // Delete a model
    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;
    println!("Model deleted successfully");

    Ok(())
}

Model Lifecycle (Load/Unload)

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    // Load model into memory via generate API
    let _response = client.load_model("llama3").await?;
    println!("Model loaded into memory");

    // Load model via chat API
    let _response = client.load_model_chat("llama3").await?;
    println!("Model loaded via chat API");

    // Unload model from memory via generate API
    let _response = client.unload_model("llama3").await?;
    println!("Model unloaded from memory");

    // Unload model via chat API
    let _response = client.unload_model_chat("llama3").await?;
    println!("Model unloaded via chat API");

    Ok(())
}

Blob Management

Requires the local feature: cargo add ollama-api-rs --features local

use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434".to_string())
        .build()?;

    let digest = "sha256:abc123...";
    
    // Check if blob exists
    let exists = client.blob_exists(digest).await?;
    println!("Blob exists: {}", exists);

    // Push blob content
    let content = b"model blob content";
    client.push_blob(digest, content).await?;
    println!("Blob pushed successfully");

    Ok(())
}

API Coverage

Ollama API Endpoint	SDK Method	Module	Feature Required
`POST /api/chat`	`chat()`, `chat_stream()`	`chat`	default
`POST /api/generate`	`generate()`, `generate_stream()`	`generate`	default
`POST /api/embed`	`embed()`	`embed`	default
`POST /api/embeddings`	`embeddings()`	`embed`	default
`GET /api/tags`	`list_models()`	`model`	default
`POST /api/show`	`show_model()`	`model`	default
`POST /api/copy`	`copy_model()`	`model`	default
`DELETE /api/delete`	`delete_model()`	`model`	default
`POST /api/pull`	`pull_model()`	`model`	default
`POST /api/push`	`push_model()`	`model`	default
`POST /api/create`	`create_model()`	`model`	default
`GET /api/ps`	`list_running_models()`	`model`	`local`
`GET /api/version`	`get_version()`	`client`	default
`HEAD /api/blobs/:digest`	`blob_exists()`	`client`	`local`
`POST /api/blobs/:digest`	`push_blob()`	`client`	`local`
`POST /v1/chat/completions`	`chat_completions()`	`openai`	default
`POST /v1/embeddings`	`openai_embeddings()`	`openai`	default
`POST /v1/responses`	`responses()`	`openai`	default

Model Lifecycle (requires `local` feature)

The following methods are available when the local feature is enabled:

load_model() / unload_model() - Load/unload via generate API
load_model_chat() / unload_model_chat() - Load/unload via chat API

Modules

The crate is organized into the following modules:

chat - Chat completion functionality (with streaming and tool support)
generate - Text generation functionality (with streaming support)
embed - Embeddings functionality (single and batch)
model - Model management functionality (CRUD, pull, push)
openai - OpenAI-compatible endpoints (chat, embeddings, responses)
client - Core client functionality, blob management, and model lifecycle
error - Error types and handling

Examples

See the examples directory for more comprehensive examples:

basic_chat.rs - Simple chat interface
streaming_chat.rs - Streaming chat responses
embeddings.rs - Generating embeddings with the modern API
legacy_embeddings.rs - Using the legacy embeddings endpoint
model_management.rs - Managing models (list, show, copy, delete)
model_lifecycle.rs - Loading and unloading models into memory (requires local)
blob_management.rs - Checking and pushing blob data (requires local)
tool_calling.rs - Using tool calling functionality
openai_compatibility.rs - Using OpenAI-compatible endpoints

Testing

Run the tests with:

cargo test

The tests include both integration tests that require a running Ollama instance and mock tests that don't.

For E2E tests against a real Ollama instance:

cargo test --test e2e_test -- --ignored

License

Apache 2.0

Author

Victor Palade victor@cloudflavor.io

Website: https://cloudflavor.io

ollama-api-rs 0.1.0