oxide-agent
A type-safe, high-performance Rust crate for building agentic systems on top of locally-hosted LLMs via Ollama. Designed with the principle that AI agent infrastructure should be as safe and ergonomic as the language it's written in.
Why oxide-agent over thin wrappers like
ollama-rs?
- Safety — input sanitization and sandboxed tool execution at the crate level.
- Concurrency —
tokio-native; run multi-agent swarms without a GIL bottleneck.- Ergonomics — idiomatic Rust APIs, not a ported Python library.
- Testability — the
OllamaClienttrait makes full unit testing possible without a running server.
Table of Contents
Installation
Requires Rust 1.75+ and an Ollama server running on localhost:11434.
[]
= { = "https://github.com/TheJagpreet/oxide-agent" }
Optional features:
# Sandboxed WASM tool execution via wasmtime
= { = "https://github.com/TheJagpreet/oxide-agent", = ["wasm-tools"] }
# High-throughput batch embedding via a Mojo-compiled binary
= { = "https://github.com/TheJagpreet/oxide-agent", = ["mojo-interop"] }
Features
HTTP Client
oxide_agent::client provides the OllamaClient trait — a mockable async abstraction over any Ollama-compatible backend — and HttpOllamaClient, the production implementation backed by reqwest.
Covered endpoints:
| Method | Endpoint | Description |
|---|---|---|
generate |
POST /api/generate |
Single-turn text completion |
chat |
POST /api/chat |
Multi-turn chat, including tool/function calling |
embed |
POST /api/embed |
Dense vector embeddings (single or batch) |
list_models |
GET /api/tags |
List models available on the server |
stream_generate |
POST /api/generate |
Token-by-token streaming completion |
stream_chat |
POST /api/chat |
Token-by-token streaming chat |
Because OllamaClient is a trait, the entire stack is testable with an in-process mock — no running Ollama server required. See examples/basic_chat.rs.
Streaming
stream_generate and stream_chat use a zero-copy NDJSON pipeline:
bytes_stream()fromreqwestforwardsByteschunks without re-allocation.tokio_util::io::StreamReaderadapts the byte stream toAsyncRead.FramedRead+LinesCodecsplits on\n, yielding one deserialized token per line.
Memory overhead is constant regardless of output length. See examples/streaming.rs.
Session Management
oxide_agent::session::Session is a stateful multi-turn conversation manager. It automatically:
- Appends every user message and assistant reply to the history.
- Estimates the token count (1 token ≈ 4 UTF-8 characters).
- Triggers context compression when history exceeds
max_tokens × compression_threshold.
Compression strategies:
| Strategy | Behaviour |
|---|---|
TruncateOldest (default) |
Drops the oldest non-system messages until under budget |
Summarize { model } |
Asks Ollama to summarise the oldest half of history into a compact system message, then discards the originals |
SessionConfig fields:
| Field | Default | Description |
|---|---|---|
max_tokens |
8_000 |
Soft token budget for the message history |
compression_threshold |
0.80 |
Fraction of budget at which compression triggers |
compression_strategy |
TruncateOldest |
Strategy applied when threshold is crossed |
See examples/session.rs.
Tool Calling
oxide_agent::tools provides ToolRegistry for registering async handler functions and dispatching tool calls returned by the model, and ToolBuilder for constructing JSON Schema tool definitions without boilerplate.
ToolBuilder parameter types:
| Method | JSON Schema type |
|---|---|
string_param(name, desc, required) |
"string" |
number_param(name, desc, required) |
"number" |
bool_param(name, desc, required) |
"boolean" |
The full agentic loop — register → attach definitions → detect tool calls → dispatch → feed results back → final answer — is covered in examples/tool_calling.rs.
Retrieval-Augmented Generation
oxide_agent::rag::VectorStore is an in-memory vector store using Ollama's /api/embed endpoint and cosine similarity for nearest-neighbour search.
add_text(text, metadata)— embed a single string and store it.add_file(path)— read a UTF-8 file, chunk by\n\n(paragraph), store each chunk withsourceandchunkmetadata.query(question, top_k)— returnVec<SearchResult>ranked by cosine similarity.
No external database is required; the store is backed by a Vec<Document> in process memory. See examples/rag.rs.
WASM Tool Isolation
Requires:
features = ["wasm-tools"]
oxide_agent::wasm::WasmTool loads and executes tool logic from .wasm binaries via wasmtime. Each tool runs in a fully sandboxed environment with no access to the host filesystem, network, or process state.
ABI contract for the .wasm module:
| Export | Signature | Responsibility |
|---|---|---|
memory |
linear memory | Default memory export |
alloc |
(size: i32) -> i32 |
Allocate size bytes, return pointer |
tool_call |
(ptr: i32, len: i32) -> i32 |
Receive JSON args, return pointer to null-terminated JSON result |
When the wasm-tools feature is disabled, WasmTool::from_file returns a descriptive Err pointing to the feature flag. See examples/wasm_tool.rs.
Mojo Interoperability
Requires:
features = ["mojo-interop"]
oxide_agent::mojo::MojoEmbedder offloads batch embedding to an external Mojo-compiled binary via a JSON line protocol over stdin/stdout.
Protocol:
stdin → {"texts": ["hello", "world"]}
stdout ← {"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]}
Useful when a compiled Mojo kernel outperforms the Ollama HTTP round-trip for your embedding workload, or when you need to embed very large corpora in batch. See examples/mojo_embedder.rs.
Error Handling
All fallible operations return Result<_, OxideError>:
| Variant | Cause |
|---|---|
OxideError::Http(reqwest::Error) |
Network or transport failure |
OxideError::ApiError(status, body) |
Non-2xx response from Ollama |
OxideError::Serde(serde_json::Error) |
JSON parse failure |
OxideError::Other(String) |
All other errors (WASM, Mojo, tool dispatch) |
OxideError implements std::error::Error and integrates with anyhow and thiserror.
Architecture
oxide-agent/
├── src/
│ ├── client/ # OllamaClient trait + HttpOllamaClient (reqwest + NDJSON streaming)
│ ├── types/ # Shared request/response structs (Generate, Chat, Embed, Tags)
│ ├── session/ # Session: multi-turn history, token estimation, compression
│ ├── tools/ # ToolRegistry, ToolBuilder, async dispatch
│ ├── rag/ # VectorStore: cosine similarity, add_text, add_file, query
│ ├── wasm/ # WasmTool: wasmtime sandbox (feature: wasm-tools)
│ ├── mojo/ # MojoEmbedder: subprocess bridge (feature: mojo-interop)
│ └── error.rs # OxideError unified error type
├── oxide-agent-macros/ # Procedural macro crate (proc-macro2, quote, syn)
└── examples/ # Runnable examples — see examples/README.md
The OllamaClient trait is the central seam of the codebase. Every subsystem — Session, VectorStore, ToolRegistry — accepts Arc<dyn OllamaClient>, meaning the entire stack is testable with an in-process mock and no Ollama server required.
Examples
Runnable, self-contained examples live in examples/. Each file maps to one feature area:
| Example | Description | Feature flag |
|---|---|---|
| basic_chat.rs | Single-turn chat and model listing | — |
| streaming.rs | Zero-copy token-by-token streaming | — |
| session.rs | Multi-turn session with compression strategies | — |
| tool_calling.rs | Full agentic tool-call loop | — |
| rag.rs | Vector store indexing and RAG-pattern chat | — |
| wasm_tool.rs | Sandboxed WASM tool execution | wasm-tools |
| mojo_embedder.rs | Batch embedding via Mojo subprocess | mojo-interop |
See examples/README.md for detailed walkthroughs, protocol documentation, and configuration references for each example.
Testing
The test suite runs entirely without a live Ollama server by implementing OllamaClient on lightweight in-process structs:
Notable test patterns used across the codebase:
MockOllamaClientinclient— verifies streaming chunk ordering and buffered response identity.EchoClientinsession— echoes user messages back to validate history tracking, system prompt placement, and truncation under configurable token budgets.FakeEmbedClientinrag— returns deterministic embeddings based on input character values to verify cosine similarity ranking without a real embedding model.- Stub guards in
wasmandmojo— assert that feature-gated code returns helpful errors with feature flag names when called without the feature enabled.
Contributing
All PRs should include unit tests. Because OllamaClient is a trait, mocking is straightforward — see the existing test modules for patterns. Run cargo clippy and cargo fmt before opening a PR.
Issues and feature requests: github.com/TheJagpreet/oxide-agent/issues
License
MIT