# oxide-agent
A type-safe, high-performance Rust crate for building agentic systems on top of locally-hosted LLMs via [Ollama](https://ollama.com). Designed with the principle that AI agent infrastructure should be as safe and ergonomic as the language it's written in.
> **Why oxide-agent over thin wrappers like `ollama-rs`?**
>
> - **Safety** — input sanitization and sandboxed tool execution at the crate level.
> - **Concurrency** — `tokio`-native; run multi-agent swarms without a GIL bottleneck.
> - **Ergonomics** — idiomatic Rust APIs, not a ported Python library.
> - **Testability** — the `OllamaClient` trait makes full unit testing possible without a running server.
---
## Table of Contents
- [Installation](#installation)
- [Features](#features)
- [HTTP Client](#http-client)
- [Streaming](#streaming)
- [Session Management](#session-management)
- [Tool Calling](#tool-calling)
- [Retrieval-Augmented Generation](#retrieval-augmented-generation)
- [WASM Tool Isolation](#wasm-tool-isolation)
- [Mojo Interoperability](#mojo-interoperability)
- [Error Handling](#error-handling)
- [Architecture](#architecture)
- [Examples](#examples)
- [Testing](#testing)
- [Contributing](#contributing)
- [License](#license)
---
## Installation
Requires Rust 1.75+ and an [Ollama](https://ollama.com) server running on `localhost:11434`.
```toml
[dependencies]
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent" }
```
**Optional features:**
```toml
# Sandboxed WASM tool execution via wasmtime
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["wasm-tools"] }
# High-throughput batch embedding via a Mojo-compiled binary
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["mojo-interop"] }
```
---
## Features
### HTTP Client
`oxide_agent::client` provides the `OllamaClient` trait — a mockable async abstraction over any Ollama-compatible backend — and `HttpOllamaClient`, the production implementation backed by `reqwest`.
**Covered endpoints:**
| `generate` | `POST /api/generate` | Single-turn text completion |
| `chat` | `POST /api/chat` | Multi-turn chat, including tool/function calling |
| `embed` | `POST /api/embed` | Dense vector embeddings (single or batch) |
| `list_models` | `GET /api/tags` | List models available on the server |
| `stream_generate` | `POST /api/generate` | Token-by-token streaming completion |
| `stream_chat` | `POST /api/chat` | Token-by-token streaming chat |
Because `OllamaClient` is a trait, the entire stack is testable with an in-process mock — no running Ollama server required. See [examples/basic_chat.rs](examples/basic_chat.rs).
---
### Streaming
`stream_generate` and `stream_chat` use a zero-copy NDJSON pipeline:
- `bytes_stream()` from `reqwest` forwards `Bytes` chunks without re-allocation.
- `tokio_util::io::StreamReader` adapts the byte stream to `AsyncRead`.
- `FramedRead` + `LinesCodec` splits on `\n`, yielding one deserialized token per line.
Memory overhead is constant regardless of output length. See [examples/streaming.rs](examples/streaming.rs).
---
### Session Management
`oxide_agent::session::Session` is a stateful multi-turn conversation manager. It automatically:
- Appends every user message and assistant reply to the history.
- Estimates the token count (1 token ≈ 4 UTF-8 characters).
- Triggers context compression when history exceeds `max_tokens × compression_threshold`.
**Compression strategies:**
| `TruncateOldest` *(default)* | Drops the oldest non-system messages until under budget |
| `Summarize { model }` | Asks Ollama to summarise the oldest half of history into a compact system message, then discards the originals |
**`SessionConfig` fields:**
| `max_tokens` | `8_000` | Soft token budget for the message history |
| `compression_threshold` | `0.80` | Fraction of budget at which compression triggers |
| `compression_strategy` | `TruncateOldest` | Strategy applied when threshold is crossed |
See [examples/session.rs](examples/session.rs).
---
### Tool Calling
`oxide_agent::tools` provides `ToolRegistry` for registering async handler functions and dispatching tool calls returned by the model, and `ToolBuilder` for constructing JSON Schema tool definitions without boilerplate.
**`ToolBuilder` parameter types:**
| `string_param(name, desc, required)` | `"string"` |
| `number_param(name, desc, required)` | `"number"` |
| `bool_param(name, desc, required)` | `"boolean"` |
The full agentic loop — register → attach definitions → detect tool calls → dispatch → feed results back → final answer — is covered in [examples/tool_calling.rs](examples/tool_calling.rs).
---
### Retrieval-Augmented Generation
`oxide_agent::rag::VectorStore` is an in-memory vector store using Ollama's `/api/embed` endpoint and cosine similarity for nearest-neighbour search.
- `add_text(text, metadata)` — embed a single string and store it.
- `add_file(path)` — read a UTF-8 file, chunk by `\n\n` (paragraph), store each chunk with `source` and `chunk` metadata.
- `query(question, top_k)` — return `Vec<SearchResult>` ranked by cosine similarity.
No external database is required; the store is backed by a `Vec<Document>` in process memory. See [examples/rag.rs](examples/rag.rs).
---
### WASM Tool Isolation
> Requires: `features = ["wasm-tools"]`
`oxide_agent::wasm::WasmTool` loads and executes tool logic from `.wasm` binaries via [wasmtime](https://wasmtime.dev/). Each tool runs in a fully sandboxed environment with no access to the host filesystem, network, or process state.
**ABI contract for the `.wasm` module:**
| `memory` | linear memory | Default memory export |
| `alloc` | `(size: i32) -> i32` | Allocate `size` bytes, return pointer |
| `tool_call` | `(ptr: i32, len: i32) -> i32` | Receive JSON args, return pointer to null-terminated JSON result |
When the `wasm-tools` feature is disabled, `WasmTool::from_file` returns a descriptive `Err` pointing to the feature flag. See [examples/wasm_tool.rs](examples/wasm_tool.rs).
---
### Mojo Interoperability
> Requires: `features = ["mojo-interop"]`
`oxide_agent::mojo::MojoEmbedder` offloads batch embedding to an external [Mojo](https://www.modular.com/mojo)-compiled binary via a JSON line protocol over stdin/stdout.
**Protocol:**
```
stdin → {"texts": ["hello", "world"]}
stdout ← {"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]}
```
Useful when a compiled Mojo kernel outperforms the Ollama HTTP round-trip for your embedding workload, or when you need to embed very large corpora in batch. See [examples/mojo_embedder.rs](examples/mojo_embedder.rs).
---
## Error Handling
All fallible operations return `Result<_, OxideError>`:
| `OxideError::Http(reqwest::Error)` | Network or transport failure |
| `OxideError::ApiError(status, body)` | Non-2xx response from Ollama |
| `OxideError::Serde(serde_json::Error)` | JSON parse failure |
| `OxideError::Other(String)` | All other errors (WASM, Mojo, tool dispatch) |
`OxideError` implements `std::error::Error` and integrates with `anyhow` and `thiserror`.
---
## Architecture
```
oxide-agent/
├── src/
│ ├── client/ # OllamaClient trait + HttpOllamaClient (reqwest + NDJSON streaming)
│ ├── types/ # Shared request/response structs (Generate, Chat, Embed, Tags)
│ ├── session/ # Session: multi-turn history, token estimation, compression
│ ├── tools/ # ToolRegistry, ToolBuilder, async dispatch
│ ├── rag/ # VectorStore: cosine similarity, add_text, add_file, query
│ ├── wasm/ # WasmTool: wasmtime sandbox (feature: wasm-tools)
│ ├── mojo/ # MojoEmbedder: subprocess bridge (feature: mojo-interop)
│ └── error.rs # OxideError unified error type
├── oxide-agent-macros/ # Procedural macro crate (proc-macro2, quote, syn)
└── examples/ # Runnable examples — see examples/README.md
```
The `OllamaClient` trait is the central seam of the codebase. Every subsystem — `Session`, `VectorStore`, `ToolRegistry` — accepts `Arc<dyn OllamaClient>`, meaning the entire stack is testable with an in-process mock and no Ollama server required.
---
## Examples
Runnable, self-contained examples live in [`examples/`](examples/). Each file maps to one feature area:
| [basic_chat.rs](examples/basic_chat.rs) | Single-turn chat and model listing | — |
| [streaming.rs](examples/streaming.rs) | Zero-copy token-by-token streaming | — |
| [session.rs](examples/session.rs) | Multi-turn session with compression strategies | — |
| [tool_calling.rs](examples/tool_calling.rs) | Full agentic tool-call loop | — |
| [rag.rs](examples/rag.rs) | Vector store indexing and RAG-pattern chat | — |
| [wasm_tool.rs](examples/wasm_tool.rs) | Sandboxed WASM tool execution | `wasm-tools` |
| [mojo_embedder.rs](examples/mojo_embedder.rs) | Batch embedding via Mojo subprocess | `mojo-interop` |
See [examples/README.md](examples/README.md) for detailed walkthroughs, protocol documentation, and configuration references for each example.
---
## Testing
The test suite runs entirely without a live Ollama server by implementing `OllamaClient` on lightweight in-process structs:
```bash
cargo test
cargo test --features wasm-tools
cargo test --features mojo-interop
```
Notable test patterns used across the codebase:
- **`MockOllamaClient`** in `client` — verifies streaming chunk ordering and buffered response identity.
- **`EchoClient`** in `session` — echoes user messages back to validate history tracking, system prompt placement, and truncation under configurable token budgets.
- **`FakeEmbedClient`** in `rag` — returns deterministic embeddings based on input character values to verify cosine similarity ranking without a real embedding model.
- **Stub guards** in `wasm` and `mojo` — assert that feature-gated code returns helpful errors with feature flag names when called without the feature enabled.
---
## Contributing
All PRs should include unit tests. Because `OllamaClient` is a trait, mocking is straightforward — see the existing test modules for patterns. Run `cargo clippy` and `cargo fmt` before opening a PR.
Issues and feature requests: [github.com/TheJagpreet/oxide-agent/issues](https://github.com/TheJagpreet/oxide-agent/issues)
---
## License
MIT