oxide-agent 0.1.0

Type-safe, high-performance Rust crate for building agentic systems on Ollama
Documentation

oxide-agent

A type-safe, high-performance Rust crate for building agentic systems on top of locally-hosted LLMs via Ollama. Designed with the principle that AI agent infrastructure should be as safe and ergonomic as the language it's written in.

Why oxide-agent over thin wrappers like ollama-rs?

  • Safety — input sanitization and sandboxed tool execution at the crate level.
  • Concurrencytokio-native; run multi-agent swarms without a GIL bottleneck.
  • Ergonomics — idiomatic Rust APIs, not a ported Python library.
  • Testability — the OllamaClient trait makes full unit testing possible without a running server.

Table of Contents


Installation

Requires Rust 1.75+ and an Ollama server running on localhost:11434.

[dependencies]
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent" }

Optional features:

# Sandboxed WASM tool execution via wasmtime
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["wasm-tools"] }

# High-throughput batch embedding via a Mojo-compiled binary
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["mojo-interop"] }

Features

HTTP Client

oxide_agent::client provides the OllamaClient trait — a mockable async abstraction over any Ollama-compatible backend — and HttpOllamaClient, the production implementation backed by reqwest.

Covered endpoints:

Method Endpoint Description
generate POST /api/generate Single-turn text completion
chat POST /api/chat Multi-turn chat, including tool/function calling
embed POST /api/embed Dense vector embeddings (single or batch)
list_models GET /api/tags List models available on the server
stream_generate POST /api/generate Token-by-token streaming completion
stream_chat POST /api/chat Token-by-token streaming chat

Because OllamaClient is a trait, the entire stack is testable with an in-process mock — no running Ollama server required. See examples/basic_chat.rs.


Streaming

stream_generate and stream_chat use a zero-copy NDJSON pipeline:

  • bytes_stream() from reqwest forwards Bytes chunks without re-allocation.
  • tokio_util::io::StreamReader adapts the byte stream to AsyncRead.
  • FramedRead + LinesCodec splits on \n, yielding one deserialized token per line.

Memory overhead is constant regardless of output length. See examples/streaming.rs.


Session Management

oxide_agent::session::Session is a stateful multi-turn conversation manager. It automatically:

  • Appends every user message and assistant reply to the history.
  • Estimates the token count (1 token ≈ 4 UTF-8 characters).
  • Triggers context compression when history exceeds max_tokens × compression_threshold.

Compression strategies:

Strategy Behaviour
TruncateOldest (default) Drops the oldest non-system messages until under budget
Summarize { model } Asks Ollama to summarise the oldest half of history into a compact system message, then discards the originals

SessionConfig fields:

Field Default Description
max_tokens 8_000 Soft token budget for the message history
compression_threshold 0.80 Fraction of budget at which compression triggers
compression_strategy TruncateOldest Strategy applied when threshold is crossed

See examples/session.rs.


Tool Calling

oxide_agent::tools provides ToolRegistry for registering async handler functions and dispatching tool calls returned by the model, and ToolBuilder for constructing JSON Schema tool definitions without boilerplate.

ToolBuilder parameter types:

Method JSON Schema type
string_param(name, desc, required) "string"
number_param(name, desc, required) "number"
bool_param(name, desc, required) "boolean"

The full agentic loop — register → attach definitions → detect tool calls → dispatch → feed results back → final answer — is covered in examples/tool_calling.rs.


Retrieval-Augmented Generation

oxide_agent::rag::VectorStore is an in-memory vector store using Ollama's /api/embed endpoint and cosine similarity for nearest-neighbour search.

  • add_text(text, metadata) — embed a single string and store it.
  • add_file(path) — read a UTF-8 file, chunk by \n\n (paragraph), store each chunk with source and chunk metadata.
  • query(question, top_k) — return Vec<SearchResult> ranked by cosine similarity.

No external database is required; the store is backed by a Vec<Document> in process memory. See examples/rag.rs.


WASM Tool Isolation

Requires: features = ["wasm-tools"]

oxide_agent::wasm::WasmTool loads and executes tool logic from .wasm binaries via wasmtime. Each tool runs in a fully sandboxed environment with no access to the host filesystem, network, or process state.

ABI contract for the .wasm module:

Export Signature Responsibility
memory linear memory Default memory export
alloc (size: i32) -> i32 Allocate size bytes, return pointer
tool_call (ptr: i32, len: i32) -> i32 Receive JSON args, return pointer to null-terminated JSON result

When the wasm-tools feature is disabled, WasmTool::from_file returns a descriptive Err pointing to the feature flag. See examples/wasm_tool.rs.


Mojo Interoperability

Requires: features = ["mojo-interop"]

oxide_agent::mojo::MojoEmbedder offloads batch embedding to an external Mojo-compiled binary via a JSON line protocol over stdin/stdout.

Protocol:

stdin  → {"texts": ["hello", "world"]}
stdout ← {"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]}

Useful when a compiled Mojo kernel outperforms the Ollama HTTP round-trip for your embedding workload, or when you need to embed very large corpora in batch. See examples/mojo_embedder.rs.


Error Handling

All fallible operations return Result<_, OxideError>:

Variant Cause
OxideError::Http(reqwest::Error) Network or transport failure
OxideError::ApiError(status, body) Non-2xx response from Ollama
OxideError::Serde(serde_json::Error) JSON parse failure
OxideError::Other(String) All other errors (WASM, Mojo, tool dispatch)

OxideError implements std::error::Error and integrates with anyhow and thiserror.


Architecture

oxide-agent/
├── src/
│   ├── client/     # OllamaClient trait + HttpOllamaClient (reqwest + NDJSON streaming)
│   ├── types/      # Shared request/response structs (Generate, Chat, Embed, Tags)
│   ├── session/    # Session: multi-turn history, token estimation, compression
│   ├── tools/      # ToolRegistry, ToolBuilder, async dispatch
│   ├── rag/        # VectorStore: cosine similarity, add_text, add_file, query
│   ├── wasm/       # WasmTool: wasmtime sandbox (feature: wasm-tools)
│   ├── mojo/       # MojoEmbedder: subprocess bridge (feature: mojo-interop)
│   └── error.rs    # OxideError unified error type
├── oxide-agent-macros/   # Procedural macro crate (proc-macro2, quote, syn)
└── examples/       # Runnable examples — see examples/README.md

The OllamaClient trait is the central seam of the codebase. Every subsystem — Session, VectorStore, ToolRegistry — accepts Arc<dyn OllamaClient>, meaning the entire stack is testable with an in-process mock and no Ollama server required.


Examples

Runnable, self-contained examples live in examples/. Each file maps to one feature area:

Example Description Feature flag
basic_chat.rs Single-turn chat and model listing
streaming.rs Zero-copy token-by-token streaming
session.rs Multi-turn session with compression strategies
tool_calling.rs Full agentic tool-call loop
rag.rs Vector store indexing and RAG-pattern chat
wasm_tool.rs Sandboxed WASM tool execution wasm-tools
mojo_embedder.rs Batch embedding via Mojo subprocess mojo-interop

See examples/README.md for detailed walkthroughs, protocol documentation, and configuration references for each example.


Testing

The test suite runs entirely without a live Ollama server by implementing OllamaClient on lightweight in-process structs:

cargo test
cargo test --features wasm-tools
cargo test --features mojo-interop

Notable test patterns used across the codebase:

  • MockOllamaClient in client — verifies streaming chunk ordering and buffered response identity.
  • EchoClient in session — echoes user messages back to validate history tracking, system prompt placement, and truncation under configurable token budgets.
  • FakeEmbedClient in rag — returns deterministic embeddings based on input character values to verify cosine similarity ranking without a real embedding model.
  • Stub guards in wasm and mojo — assert that feature-gated code returns helpful errors with feature flag names when called without the feature enabled.

Contributing

All PRs should include unit tests. Because OllamaClient is a trait, mocking is straightforward — see the existing test modules for patterns. Run cargo clippy and cargo fmt before opening a PR.

Issues and feature requests: github.com/TheJagpreet/oxide-agent/issues


License

MIT