ollama-api-rs 0.3.1

# ollama-api-rs

A Rust SDK for the Ollama API with async support and OpenAI compatibility.

[![Crates.io](https://img.shields.io/crates/v/ollama-api-rs)](https://crates.io/crates/ollama-api-rs)
[![Documentation](https://docs.rs/ollama-api-rs/badge.svg)](https://docs.rs/ollama-api-rs)
[![License](https://img.shields.io/crates/l/ollama-api-rs)](https://codeberg.org/cloudflavor/ollama-api-rs/src/branch/main/LICENSE)

## Features

- Async/await support
- Easy client configuration with `ModelClient::builder()`
- Streaming responses (chat and generation)
- Full compatibility with Ollama API
- OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/embeddings`, `/v1/responses`)
- Modular design with separate modules for chat, generate, embed, and model operations
- Comprehensive error handling with custom error types
- Convenience constructors: `Message::user()`, `Message::assistant()`, `Message::system()`, `ChatMessage::user()`
- Complete API coverage including:
  - Chat completions with tool calling
  - Text generation
  - Embeddings (single and batch)
  - Model management (list, show, copy, delete, pull, push, create)
  - Model lifecycle (load/unload)
  - Blob management
  - Running models introspection

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
ollama-api-rs = "0.3.0"
```

Then import it in your Rust code as:

```rust
use oai_sdk::{ModelClient, ChatRequest, Message};
```

For local-only features (blob management, model lifecycle, running models introspection):

```toml
[dependencies]
ollama-api-rs = { version = "0.3.0", features = ["local"] }
```

## Authentication

For cloud access to ollama.com or private models, configure authentication:

```rust
let client = ModelClient::builder()
    .base_url("https://ollama.com")
    .auth_token("your-auth-token")
    .build()?;
```

## OpenAI Compatibility

Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:

- `POST /v1/chat/completions` - Chat completions
- `POST /v1/embeddings` - Embeddings generation
- `POST /v1/responses` - Response generation

Use base URL `http://localhost:11434/v1/` with any API key:

```python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)
```

## Usage

### Basic Chat Completion

```rust
use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Why is the sky blue?")],
        ..Default::default()
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}
```

### Streaming Chat Responses

```rust
use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Write a short story about Rust.")],
        stream: true,
        ..Default::default()
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.message.content),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}
```

### Text Generation

```rust
use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        ..Default::default()
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}
```

### Streaming Text Generation

```rust
use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Write a haiku about Rust".to_string(),
        stream: true,
        ..Default::default()
    };

    let mut stream = client.generate_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.response),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}
```

### Embeddings (Single)

```rust
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}
```

### Batch Embeddings

```rust
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Multiple(vec![
            "Hello, world!".to_string(),
            "Goodbye, world!".to_string(),
        ]),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Batch embeddings: {:?}", response.embeddings);

    Ok(())
}
```

### Legacy Embeddings

```rust
use oai_sdk::{ModelClient, EmbeddingsRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbeddingsRequest {
        model: "llama3:8b".to_string(),
        prompt: "Hello, world!".to_string(),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embeddings(request).await?;
    println!("Legacy embedding: {:?}", response.embedding);

    Ok(())
}
```

### Tool Calling

```rust
use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("What is the weather in Tokyo?")],
        tools: Some(tools),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
            println!("Arguments: {}",
                serde_json::to_string_pretty(&tool_call.function.arguments)?);
        }
    }

    Ok(())
}
```

### OpenAI-Compatible Chat

```rust
use oai_sdk::{ModelClient, ChatCompletionsRequest, ChatMessage};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatCompletionsRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![ChatMessage::user("Why is the sky blue?")],
        stream: Some(false),
        ..Default::default()
    };

    let response = client.chat_completions(request).await?;
    println!("{}", response.choices[0].message.content);

    Ok(())
}
```

### Model Management

```rust
use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let models = client.list_models().await?;
    for model in models {
        println!("Model: {} ({})", model.name, model.details.parameter_size);
    }

    let request = ShowModelRequest {
        model: "llama3.1:8b".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    let copy_req = CopyModelRequest {
        source: "llama3.1:8b".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;
    println!("Model copied successfully");

    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;
    println!("Model deleted successfully");

    Ok(())
}
```

### Model Lifecycle (Load/Unload)

Requires the `local` feature: `cargo add ollama-api-rs --features local`

```rust
use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    client.load_model("llama3.1:8b").await?;
    println!("Model loaded into memory");

    client.unload_model("llama3.1:8b").await?;
    println!("Model unloaded from memory");

    Ok(())
}
```

### Blob Management

Requires the `local` feature: `cargo add ollama-api-rs --features local`

```rust
use oai_sdk::ModelClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let digest = "sha256:abc123...";

    let exists = client.blob_exists(digest).await?;
    println!("Blob exists: {}", exists);

    let content = b"model blob content";
    client.push_blob(digest, content).await?;
    println!("Blob pushed successfully");

    Ok(())
}
```

## API Coverage

| Ollama API Endpoint | SDK Method | Module | Feature Required |
|---|---|---|---|
| `POST /api/chat` | `chat()`, `chat_stream()` | `chat` | default |
| `POST /api/generate` | `generate()`, `generate_stream()` | `generate` | default |
| `POST /api/embed` | `embed()` | `embed` | default |
| `POST /api/embeddings` | `embeddings()` | `embed` | default |
| `GET /api/tags` | `list_models()` | `model` | default |
| `POST /api/show` | `show_model()` | `model` | default |
| `POST /api/copy` | `copy_model()` | `model` | default |
| `DELETE /api/delete` | `delete_model()` | `model` | default |
| `POST /api/pull` | `pull_model()` | `model` | default |
| `POST /api/push` | `push_model()` | `model` | default |
| `POST /api/create` | `create_model()` | `model` | default |
| `GET /api/ps` | `list_running_models()` | `model` | `local` |
| `GET /api/version` | `get_version()` | `client` | default |
| `HEAD /api/blobs/:digest` | `blob_exists()` | `client` | `local` |
| `POST /api/blobs/:digest` | `push_blob()` | `client` | `local` |
| `POST /v1/chat/completions` | `chat_completions()` | `openai` | default |
| `POST /v1/embeddings` | `openai_embeddings()` | `openai` | default |
| `POST /v1/responses` | `responses()` | `openai` | default |

### Model Lifecycle (requires `local` feature)

The following methods are available when the `local` feature is enabled:

- `load_model()` / `unload_model()` - Load/unload models into memory

## Modules

The crate is organized into the following modules:

- `chat` - Chat completion functionality (with streaming and tool support)
- `generate` - Text generation functionality (with streaming support)
- `embed` - Embeddings functionality (single and batch)
- `model` - Model management functionality (CRUD, pull, push)
- `openai` - OpenAI-compatible endpoints (chat, embeddings, responses)
- `client` - Core client functionality, blob management, and model lifecycle
- `error` - Error types and handling

## Examples

See the [examples](./examples) directory for more comprehensive examples:

- `basic_chat.rs` - Simple chat interface
- `streaming_chat.rs` - Streaming chat responses
- `embeddings.rs` - Generating embeddings with the modern API
- `model_management.rs` - Managing models (list, show, copy, delete)
- `model_lifecycle.rs` - Loading and unloading models into memory (requires `local`)
- `tool_calling.rs` - Using tool calling functionality
- `openai_compatibility.rs` - Using OpenAI-compatible endpoints

## Testing

Run the tests with:

```bash
cargo test
```

The tests include both integration tests that require a running Ollama instance and mock tests that don't.

For E2E tests against a real Ollama instance:

```bash
cargo test --test e2e_test -- --ignored
```

## License

Apache 2.0

## Author

Victor Palade <victor@cloudflavor.io>

Website: https://cloudflavor.io