oxide-agent

A type-safe, high-performance Rust crate for building agentic systems on top of locally-hosted LLMs via Ollama. Designed with the principle that AI agent infrastructure should be as safe and ergonomic as the language it's written in.

Why oxide-agent over thin wrappers like ollama-rs?

Safety — input sanitization and sandboxed tool execution at the crate level.

Concurrency — tokio-native; run multi-agent swarms without a GIL bottleneck.

Ergonomics — idiomatic Rust APIs, not a ported Python library.

Testability — the OllamaClient trait makes full unit testing possible without a running server.

Installation
Features
Error Handling
Architecture
Examples
Testing
Contributing
License

Installation

Requires Rust 1.75+ and an Ollama server running on localhost:11434.

[dependencies]
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent" }

Optional features:

# Sandboxed WASM tool execution via wasmtime
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["wasm-tools"] }

# High-throughput batch embedding via a Mojo-compiled binary
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["mojo-interop"] }

Features

HTTP Client

oxide_agent::client provides the OllamaClient trait — a mockable async abstraction over any Ollama-compatible backend — and HttpOllamaClient, the production implementation backed by reqwest.

Covered endpoints:

Method	Endpoint	Description
`generate`	`POST /api/generate`	Single-turn text completion
`chat`	`POST /api/chat`	Multi-turn chat, including tool/function calling
`embed`	`POST /api/embed`	Dense vector embeddings (single or batch)
`list_models`	`GET /api/tags`	List models available on the server
`stream_generate`	`POST /api/generate`	Token-by-token streaming completion
`stream_chat`	`POST /api/chat`	Token-by-token streaming chat

Because OllamaClient is a trait, the entire stack is testable with an in-process mock — no running Ollama server required. See examples/basic_chat.rs.

Streaming

stream_generate and stream_chat use a zero-copy NDJSON pipeline:

bytes_stream() from reqwest forwards Bytes chunks without re-allocation.
tokio_util::io::StreamReader adapts the byte stream to AsyncRead.
FramedRead + LinesCodec splits on \n, yielding one deserialized token per line.

Memory overhead is constant regardless of output length. See examples/streaming.rs.

Session Management

oxide_agent::session::Session is a stateful multi-turn conversation manager. It automatically:

Appends every user message and assistant reply to the history.
Estimates the token count (1 token ≈ 4 UTF-8 characters).
Triggers context compression when history exceeds max_tokens × compression_threshold.

Compression strategies:

Strategy	Behaviour
`TruncateOldest` (default)	Drops the oldest non-system messages until under budget
`Summarize { model }`	Asks Ollama to summarise the oldest half of history into a compact system message, then discards the originals

SessionConfig fields:

Field	Default	Description
`max_tokens`	`8_000`	Soft token budget for the message history
`compression_threshold`	`0.80`	Fraction of budget at which compression triggers
`compression_strategy`	`TruncateOldest`	Strategy applied when threshold is crossed

See examples/session.rs.

Tool Calling

oxide_agent::tools provides ToolRegistry for registering async handler functions and dispatching tool calls returned by the model, and ToolBuilder for constructing JSON Schema tool definitions without boilerplate.

ToolBuilder parameter types:

Method	JSON Schema type
`string_param(name, desc, required)`	`"string"`
`number_param(name, desc, required)`	`"number"`
`bool_param(name, desc, required)`	`"boolean"`

The full agentic loop — register → attach definitions → detect tool calls → dispatch → feed results back → final answer — is covered in examples/tool_calling.rs.

Retrieval-Augmented Generation

oxide_agent::rag::VectorStore is an in-memory vector store using Ollama's /api/embed endpoint and cosine similarity for nearest-neighbour search.

add_text(text, metadata) — embed a single string and store it.
add_file(path) — read a UTF-8 file, chunk by \n\n (paragraph), store each chunk with source and chunk metadata.
query(question, top_k) — return Vec<SearchResult> ranked by cosine similarity.

No external database is required; the store is backed by a Vec<Document> in process memory. See examples/rag.rs.

WASM Tool Isolation

Requires: features = ["wasm-tools"]

oxide_agent::wasm::WasmTool loads and executes tool logic from .wasm binaries via wasmtime. Each tool runs in a fully sandboxed environment with no access to the host filesystem, network, or process state.

ABI contract for the .wasm module:

Export	Signature	Responsibility
`memory`	linear memory	Default memory export
`alloc`	`(size: i32) -> i32`	Allocate `size` bytes, return pointer
`tool_call`	`(ptr: i32, len: i32) -> i32`	Receive JSON args, return pointer to null-terminated JSON result

When the wasm-tools feature is disabled, WasmTool::from_file returns a descriptive Err pointing to the feature flag. See examples/wasm_tool.rs.

Mojo Interoperability

Requires: features = ["mojo-interop"]

oxide_agent::mojo::MojoEmbedder offloads batch embedding to an external Mojo-compiled binary via a JSON line protocol over stdin/stdout.

Protocol:

stdin  → {"texts": ["hello", "world"]}
stdout ← {"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]}

Useful when a compiled Mojo kernel outperforms the Ollama HTTP round-trip for your embedding workload, or when you need to embed very large corpora in batch. See examples/mojo_embedder.rs.

Error Handling

All fallible operations return Result<_, OxideError>:

Variant	Cause
`OxideError::Http(reqwest::Error)`	Network or transport failure
`OxideError::ApiError(status, body)`	Non-2xx response from Ollama
`OxideError::Serde(serde_json::Error)`	JSON parse failure
`OxideError::Other(String)`	All other errors (WASM, Mojo, tool dispatch)

OxideError implements std::error::Error and integrates with anyhow and thiserror.

Architecture

oxide-agent/
├── src/
│   ├── client/     # OllamaClient trait + HttpOllamaClient (reqwest + NDJSON streaming)
│   ├── types/      # Shared request/response structs (Generate, Chat, Embed, Tags)
│   ├── session/    # Session: multi-turn history, token estimation, compression
│   ├── tools/      # ToolRegistry, ToolBuilder, async dispatch
│   ├── rag/        # VectorStore: cosine similarity, add_text, add_file, query
│   ├── wasm/       # WasmTool: wasmtime sandbox (feature: wasm-tools)
│   ├── mojo/       # MojoEmbedder: subprocess bridge (feature: mojo-interop)
│   └── error.rs    # OxideError unified error type
├── oxide-agent-macros/   # Procedural macro crate (proc-macro2, quote, syn)
└── examples/       # Runnable examples — see examples/README.md

The OllamaClient trait is the central seam of the codebase. Every subsystem — Session, VectorStore, ToolRegistry — accepts Arc<dyn OllamaClient>, meaning the entire stack is testable with an in-process mock and no Ollama server required.

Examples

Runnable, self-contained examples live in examples/. Each file maps to one feature area:

Example	Description	Feature flag
basic_chat.rs	Single-turn chat and model listing	—
streaming.rs	Zero-copy token-by-token streaming	—
session.rs	Multi-turn session with compression strategies	—
tool_calling.rs	Full agentic tool-call loop	—
rag.rs	Vector store indexing and RAG-pattern chat	—
wasm_tool.rs	Sandboxed WASM tool execution	`wasm-tools`
mojo_embedder.rs	Batch embedding via Mojo subprocess	`mojo-interop`

See examples/README.md for detailed walkthroughs, protocol documentation, and configuration references for each example.

Testing

The test suite runs entirely without a live Ollama server by implementing OllamaClient on lightweight in-process structs:

cargo test
cargo test --features wasm-tools
cargo test --features mojo-interop

Notable test patterns used across the codebase:

MockOllamaClient in client — verifies streaming chunk ordering and buffered response identity.
EchoClient in session — echoes user messages back to validate history tracking, system prompt placement, and truncation under configurable token budgets.
FakeEmbedClient in rag — returns deterministic embeddings based on input character values to verify cosine similarity ranking without a real embedding model.
Stub guards in wasm and mojo — assert that feature-gated code returns helpful errors with feature flag names when called without the feature enabled.

Contributing

All PRs should include unit tests. Because OllamaClient is a trait, mocking is straightforward — see the existing test modules for patterns. Run cargo clippy and cargo fmt before opening a PR.

Issues and feature requests: github.com/TheJagpreet/oxide-agent/issues

License

MIT

oxide-agent 0.1.0