Crate oai_sdk

Expand description

§Library Name Note

This library is published as ollama-api-rs on crates.io. Users should write use oai_sdk::{ModelClient, ChatRequest, Message};

§Features

Async/await support - Built on top of Tokio for efficient async operations
Easy configuration - Simple client setup with ModelClient::builder()
Streaming responses - Real-time streaming for both chat and generation
Full Ollama API compatibility - Complete coverage of all Ollama API endpoints
Modular design - Separate modules for chat, generate, embed, and model operations
Comprehensive error handling - Custom error types with detailed context
Tool calling - Support for function/tool calling in chat completions
Structured outputs - JSON schema validation support for responses
Model lifecycle management - Load/unload models programmatically
Blob management - Push and check model blobs
Batch embeddings - Efficient batch processing for embeddings

§Examples

§Basic Chat Completion

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Why is the sky blue?")],
        stream: false,
        ..Default::default()
    };

    let response = client.chat(request).await?;
    println!("{}", response.message.content);

    Ok(())
}

§Streaming Chat

use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("Write a story about Rust")],
        stream: true,
        ..Default::default()
    };

    let mut stream = client.chat_stream(request).await?;
    while let Some(result) = stream.next().await {
        match result {
            Ok(response) => print!("{}", response.message.content),
            Err(e) => eprintln!("Error: {}", e),
        }
    }

    Ok(())
}

§Text Generation

use oai_sdk::{ModelClient, GenerateRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = GenerateRequest {
        model: "llama3.1:8b".to_string(),
        prompt: "Why is the sky blue?".to_string(),
        ..Default::default()
    };

    let response = client.generate(request).await?;
    println!("{}", response.response);

    Ok(())
}

§Embeddings

use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = EmbedRequest {
        model: "llama3:8b".to_string(),
        input: EmbedInput::Single("Hello, world!".to_string()),
        truncate: Some(true),
        ..Default::default()
    };

    let response = client.embed(request).await?;
    println!("Embeddings: {:?}", response.embeddings);

    Ok(())
}

§Tool Calling

use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let tools = vec![
        Tool {
            tool_type: "function".to_string(),
            function: ToolFunction {
                name: "get_current_weather".to_string(),
                description: "Get the current weather for a location".to_string(),
                parameters: json!({
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The location to get the weather for"
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        }
                    },
                    "required": ["location", "format"]
                }),
            }
        }
    ];

    let request = ChatRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![Message::user("What is the weather in Tokyo?")],
        tools: Some(tools),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(tool_calls) = response.message.tool_calls {
        for tool_call in tool_calls {
            println!("Tool call: {}", tool_call.function.name);
        }
    }

    Ok(())
}

§Thinking Mode

Models with thinking capabilities emit reasoning traces in a separate field. Enable it by setting think on the request.

use oai_sdk::{ModelClient, ChatRequest, Message};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatRequest {
        model: "qwen3".to_string(),
        messages: vec![Message::user("How many letter r are in strawberry?")],
        think: Some(true.into()),
        ..Default::default()
    };

    let response = client.chat(request).await?;
    if let Some(thinking) = response.message.thinking {
        println!("Thinking:\n{}", thinking);
    }
    println!("Answer:\n{}", response.message.content);

    Ok(())
}

String levels are also accepted by models that support them:

let _request = ChatRequest {
    model: "gpt-oss".to_string(),
    messages: vec![Message::user("Tell me about Canada.")],
    think: Some("medium".into()),
    ..Default::default()
};

§Model Management

use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let models = client.list_models().await?;
    for model in models {
        println!("Model: {}", model.name);
    }

    let request = ShowModelRequest {
        model: "llama3.1:8b".to_string(),
        verbose: Some(true),
    };
    let info = client.show_model(request).await?;
    println!("Model info: {:?}", info);

    let copy_req = CopyModelRequest {
        source: "llama3.1:8b".to_string(),
        destination: "llama3-backup".to_string(),
    };
    client.copy_model(copy_req).await?;

    let delete_req = DeleteModelRequest {
        model: "llama3-backup".to_string(),
    };
    client.delete_model(delete_req).await?;

    Ok(())
}

§OpenAI-Compatible Endpoints

use oai_sdk::{ModelClient, ChatCompletionsRequest, ChatMessage};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    let request = ChatCompletionsRequest {
        model: "llama3.1:8b".to_string(),
        messages: vec![ChatMessage::user("Why is the sky blue?")],
        stream: Some(false),
        ..Default::default()
    };

    let response = client.chat_completions(request).await?;
    println!("{}", response.choices[0].message.content);

    Ok(())
}

§Model Lifecycle (requires `local` feature)

use oai_sdk::ModelClient;

    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .build()?;

    client.load_model("llama3.1:8b").await?;
    println!("Model loaded");

    client.unload_model("llama3.1:8b").await?;
    println!("Model unloaded");

    Ok(())

§Web Search & Fetch

use oai_sdk::{ModelClient, WebSearchRequest, WebFetchRequest};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = ModelClient::builder()
        .base_url("http://localhost:11434")
        .cloud_url("https://ollama.com")
        .auth_token("your-api-key".to_string())
        .build()?;

    // Search the web
    let search = client.web_search(WebSearchRequest {
        query: "Rust programming language".to_string(),
        max_results: Some(3),
    }).await?;

    for result in &search.results {
        println!("{}: {}", result.title, result.url);
    }

    // Fetch a web page
    let page = client.web_fetch(WebFetchRequest {
        url: "https://www.rust-lang.org".to_string(),
    }).await?;

    println!("Title: {}", page.title);
    println!("Links: {}", page.links.len());

    Ok(())
}

§API Modules

chat - Chat completion with streaming and tool support
generate - Text generation with streaming support
embed - Single and batch embeddings
model - Model management (CRUD, pull, push, running models)
openai - OpenAI-compatible endpoints (chat, embeddings, responses)
web - Web search and fetch endpoints
client - Core client, blob management, model lifecycle
error - Error types and handling

Structs§

ChatCompletionsRequest: Request for chat completions
ChatCompletionsResponse: Response for chat completions
ChatMessage: A chat message
ChatRequest: Request for chat completion.
ChatResponse: Response for chat completion.
CopyModelRequest: Request for copying a model.
CreateModelRequest: Request for creating a model.
DeleteModelRequest: Request for deleting a model.
EmbedRequest: Request for embeddings.
EmbedResponse: Response for embeddings.
EmbeddingsRequest: Request for legacy embeddings.
EmbeddingsResponse: Response for legacy embeddings.
GenerateRequest: Request for text generation.
GenerateResponse: Response for text generation.
ListModelsResponse: Response for listing models.
ListRunningModelsResponse: Response for listing running models.
Message: A message in a chat.
ModelClient: A client for interacting with the Ollama API.
ModelClientBuilder: A builder for creating a ModelClient.
ModelDetails: Details about a model.
ModelInfo: Information about a model.
OpenAIEmbedding: Embedding vector
OpenAIEmbeddingsRequest: Request for embeddings
OpenAIEmbeddingsResponse: Response for embeddings
PullModelRequest: Request for pulling a model.
PushModelRequest: Request for pushing a model.
ResponsesRequest: Request for responses endpoint
ResponsesResponse: Response for responses endpoint
RunningModel: A running model.
ShowModelRequest: Request for showing model information.
ShowModelResponse: Response for showing model information.
StatusResponse: Status response for streaming operations.
Tool: A tool that can be used by the model.
ToolCall: A tool call.
ToolCallFunction: A tool call function.
ToolFunction: A tool function.
VersionResponse: Response for version information.
WebFetchRequest: Request for web fetch.
WebFetchResponse: Response for web fetch.
WebSearchRequest: Request for web search.
WebSearchResponse: Response for web search.
WebSearchResult: A single result from a web search.

Enums§

EmbedInput: Input for embeddings.
Format: Format for the response.
License: License information.
OllamaError: Errors that can occur when using the Ollama client
OpenAIEmbeddingsInput: Input for embeddings
ThinkLevel: Controls the reasoning level for models that support thinking.

Type Aliases§

Result: Result type alias for Ollama client operations

Crate oai_sdk

Crate oai_sdk Copy item path

§Library Name Note

§Features

§Examples

§Basic Chat Completion

§Streaming Chat

§Text Generation

§Embeddings

§Tool Calling

§Thinking Mode

§Model Management

§OpenAI-Compatible Endpoints

§Model Lifecycle (requires local feature)

§Web Search & Fetch

§API Modules

Structs§

Enums§

Type Aliases§

Crate oai_sdk

§Model Lifecycle (requires `local` feature)