ollama-api-rs
A Rust SDK for the Ollama API with async support and OpenAI compatibility.

Features
- Async/await support
- Easy client configuration with
ModelClient::builder()
- Streaming responses (chat and generation)
- Full compatibility with Ollama API
- OpenAI-compatible endpoints (
/v1/chat/completions, /v1/embeddings, /v1/responses)
- Modular design with separate modules for chat, generate, embed, and model operations
- Comprehensive error handling with custom error types
- Complete API coverage including:
- Chat completions with tool calling
- Text generation
- Embeddings (single and batch)
- Model management (list, show, copy, delete, pull, push, create)
- Model lifecycle (load/unload)
- Blob management
- Running models introspection
Installation
Add this to your Cargo.toml:
[dependencies]
ollama-api-rs = "0.1.0"
Then import it in your Rust code as:
use oai_sdk::{ModelClient, ChatRequest, Message};
For local-only features (blob management, model lifecycle, running models introspection):
[dependencies]
ollama-api-rs = { version = "0.1.0", features = ["local"] }
Authentication
For cloud access to ollama.com or private models, configure authentication:
let client = ModelClient::builder()
.base_url("https://ollama.com".to_string())
.auth_token("your-auth-token".to_string())
.build()?;
OpenAI Compatibility
Ollama provides OpenAI-compatible endpoints that work with standard OpenAI client libraries:
POST /v1/chat/completions - Chat completions
POST /v1/embeddings - Embeddings generation
POST /v1/responses - Response generation
Use base URL http://localhost:11434/v1/ with any API key:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', )
Usage
Basic Chat Completion
use oai_sdk::{ModelClient, ChatRequest, Message};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = ChatRequest {
model: "llama3".to_string(),
messages: vec![
Message {
role: "user".to_string(),
content: "Why is the sky blue?".to_string(),
images: None,
tool_calls: None,
tool_name: None,
thinking: None,
}
],
stream: false,
format: None,
options: None,
keep_alive: None,
tools: None,
think: None,
};
let response = client.chat(request).await?;
println!("{}", response.message.content);
Ok(())
}
Streaming Chat Responses
use oai_sdk::{ModelClient, ChatRequest, Message};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = ChatRequest {
model: "llama3".to_string(),
messages: vec![
Message {
role: "user".to_string(),
content: "Write a short story about Rust.".to_string(),
images: None,
tool_calls: None,
tool_name: None,
thinking: None,
}
],
stream: true,
format: None,
options: None,
keep_alive: None,
tools: None,
think: None,
};
let mut stream = client.chat_stream(request).await?;
while let Some(result) = stream.next().await {
match result {
Ok(response) => {
print!("{}", response.message.content);
std::io::stdout().flush()?;
}
Err(e) => {
eprintln!("Error: {}", e);
break;
}
}
}
Ok(())
}
Text Generation
use oai_sdk::{ModelClient, GenerateRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = GenerateRequest {
model: "llama3".to_string(),
prompt: "Why is the sky blue?".to_string(),
stream: false,
suffix: None,
images: None,
format: None,
options: None,
system: None,
template: None,
raw: None,
keep_alive: None,
context: None,
think: None,
width: None,
height: None,
steps: None,
};
let response = client.generate(request).await?;
println!("{}", response.response);
Ok(())
}
Streaming Text Generation
use oai_sdk::{ModelClient, GenerateRequest};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = GenerateRequest {
model: "llama3".to_string(),
prompt: "Write a haiku about Rust".to_string(),
stream: true,
..Default::default()
};
let mut stream = client.generate_stream(request).await?;
while let Some(result) = stream.next().await {
match result {
Ok(response) => print!("{}", response.response),
Err(e) => eprintln!("Error: {}", e),
}
}
Ok(())
}
Embeddings (Single)
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = EmbedRequest {
model: "llama3:8b".to_string(),
input: EmbedInput::Single("Hello, world!".to_string()),
truncate: Some(true),
options: None,
keep_alive: None,
dimensions: None,
};
let response = client.embed(request).await?;
println!("Embeddings: {:?}", response.embeddings);
Ok(())
}
Batch Embeddings
use oai_sdk::{ModelClient, EmbedRequest, EmbedInput};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = EmbedRequest {
model: "llama3:8b".to_string(),
input: EmbedInput::Multiple(vec![
"Hello, world!".to_string(),
"Goodbye, world!".to_string(),
]),
truncate: Some(true),
options: None,
keep_alive: None,
dimensions: None,
};
let response = client.embed(request).await?;
println!("Batch embeddings: {:?}", response.embeddings);
Ok(())
}
Legacy Embeddings
use oai_sdk::{ModelClient, EmbeddingsRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let request = EmbeddingsRequest {
model: "llama3:8b".to_string(),
prompt: "Hello, world!".to_string(),
truncate: Some(true),
options: None,
keep_alive: None,
};
let response = client.embeddings(request).await?;
println!("Legacy embedding: {:?}", response.embedding);
Ok(())
}
Tool Calling
use oai_sdk::{ModelClient, ChatRequest, Message, Tool, ToolFunction};
use serde_json::json;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let tools = vec![
Tool {
tool_type: "function".to_string(),
function: ToolFunction {
name: "get_current_weather".to_string(),
description: "Get the current weather for a location".to_string(),
parameters: json!({
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for"
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}),
}
}
];
let request = ChatRequest {
model: "llama3".to_string(),
messages: vec![
Message {
role: "user".to_string(),
content: "What is the weather in Tokyo?".to_string(),
images: None,
tool_calls: None,
tool_name: None,
thinking: None,
}
],
stream: false,
format: None,
options: None,
keep_alive: None,
tools: Some(tools),
think: None,
};
let response = client.chat(request).await?;
if let Some(tool_calls) = response.message.tool_calls {
for tool_call in tool_calls {
println!("Tool call: {}", tool_call.function.name);
println!("Arguments: {}",
serde_json::to_string_pretty(&tool_call.function.arguments)?);
}
}
Ok(())
}
Model Management
use oai_sdk::{ModelClient, ShowModelRequest, CopyModelRequest, DeleteModelRequest};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let models = client.list_models().await?;
for model in models {
println!("Model: {} ({})", model.name, model.details.parameter_size);
}
let running = client.list_running_models().await?;
for model in running {
println!("Running: {} expires at {}", model.name, model.expires_at);
}
let request = ShowModelRequest {
model: "llama3".to_string(),
verbose: Some(true),
};
let info = client.show_model(request).await?;
println!("Model info: {:?}", info);
let copy_req = CopyModelRequest {
source: "llama3".to_string(),
destination: "llama3-backup".to_string(),
};
client.copy_model(copy_req).await?;
println!("Model copied successfully");
let delete_req = DeleteModelRequest {
model: "llama3-backup".to_string(),
};
client.delete_model(delete_req).await?;
println!("Model deleted successfully");
Ok(())
}
Model Lifecycle (Load/Unload)
Requires the local feature: cargo add ollama-api-rs --features local
use oai_sdk::ModelClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let _response = client.load_model("llama3").await?;
println!("Model loaded into memory");
let _response = client.load_model_chat("llama3").await?;
println!("Model loaded via chat API");
let _response = client.unload_model("llama3").await?;
println!("Model unloaded from memory");
let _response = client.unload_model_chat("llama3").await?;
println!("Model unloaded via chat API");
Ok(())
}
Blob Management
Requires the local feature: cargo add ollama-api-rs --features local
use oai_sdk::ModelClient;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = ModelClient::builder()
.base_url("http://localhost:11434".to_string())
.build()?;
let digest = "sha256:abc123...";
let exists = client.blob_exists(digest).await?;
println!("Blob exists: {}", exists);
let content = b"model blob content";
client.push_blob(digest, content).await?;
println!("Blob pushed successfully");
Ok(())
}
API Coverage
| Ollama API Endpoint |
SDK Method |
Module |
Feature Required |
POST /api/chat |
chat(), chat_stream() |
chat |
default |
POST /api/generate |
generate(), generate_stream() |
generate |
default |
POST /api/embed |
embed() |
embed |
default |
POST /api/embeddings |
embeddings() |
embed |
default |
GET /api/tags |
list_models() |
model |
default |
POST /api/show |
show_model() |
model |
default |
POST /api/copy |
copy_model() |
model |
default |
DELETE /api/delete |
delete_model() |
model |
default |
POST /api/pull |
pull_model() |
model |
default |
POST /api/push |
push_model() |
model |
default |
POST /api/create |
create_model() |
model |
default |
GET /api/ps |
list_running_models() |
model |
local |
GET /api/version |
get_version() |
client |
default |
HEAD /api/blobs/:digest |
blob_exists() |
client |
local |
POST /api/blobs/:digest |
push_blob() |
client |
local |
POST /v1/chat/completions |
chat_completions() |
openai |
default |
POST /v1/embeddings |
openai_embeddings() |
openai |
default |
POST /v1/responses |
responses() |
openai |
default |
Model Lifecycle (requires local feature)
The following methods are available when the local feature is enabled:
load_model() / unload_model() - Load/unload via generate API
load_model_chat() / unload_model_chat() - Load/unload via chat API
Modules
The crate is organized into the following modules:
chat - Chat completion functionality (with streaming and tool support)
generate - Text generation functionality (with streaming support)
embed - Embeddings functionality (single and batch)
model - Model management functionality (CRUD, pull, push)
openai - OpenAI-compatible endpoints (chat, embeddings, responses)
client - Core client functionality, blob management, and model lifecycle
error - Error types and handling
Examples
See the examples directory for more comprehensive examples:
basic_chat.rs - Simple chat interface
streaming_chat.rs - Streaming chat responses
embeddings.rs - Generating embeddings with the modern API
legacy_embeddings.rs - Using the legacy embeddings endpoint
model_management.rs - Managing models (list, show, copy, delete)
model_lifecycle.rs - Loading and unloading models into memory (requires local)
blob_management.rs - Checking and pushing blob data (requires local)
tool_calling.rs - Using tool calling functionality
openai_compatibility.rs - Using OpenAI-compatible endpoints
Testing
Run the tests with:
cargo test
The tests include both integration tests that require a running Ollama instance and mock tests that don't.
For E2E tests against a real Ollama instance:
cargo test --test e2e_test -- --ignored
License
Apache 2.0
Author
Victor Palade victor@cloudflavor.io
Website: https://cloudflavor.io