oxide-agent 0.1.0

Type-safe, high-performance Rust crate for building agentic systems on Ollama
Documentation
# oxide-agent

A type-safe, high-performance Rust crate for building agentic systems on top of locally-hosted LLMs via [Ollama](https://ollama.com). Designed with the principle that AI agent infrastructure should be as safe and ergonomic as the language it's written in.

> **Why oxide-agent over thin wrappers like `ollama-rs`?**
>
> - **Safety** — input sanitization and sandboxed tool execution at the crate level.
> - **Concurrency**`tokio`-native; run multi-agent swarms without a GIL bottleneck.
> - **Ergonomics** — idiomatic Rust APIs, not a ported Python library.
> - **Testability** — the `OllamaClient` trait makes full unit testing possible without a running server.

---

## Table of Contents

- [Installation]#installation
- [Features]#features
  - [HTTP Client]#http-client
  - [Streaming]#streaming
  - [Session Management]#session-management
  - [Tool Calling]#tool-calling
  - [Retrieval-Augmented Generation]#retrieval-augmented-generation
  - [WASM Tool Isolation]#wasm-tool-isolation
  - [Mojo Interoperability]#mojo-interoperability
- [Error Handling]#error-handling
- [Architecture]#architecture
- [Examples]#examples
- [Testing]#testing
- [Contributing]#contributing
- [License]#license

---

## Installation

Requires Rust 1.75+ and an [Ollama](https://ollama.com) server running on `localhost:11434`.

```toml
[dependencies]
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent" }
```

**Optional features:**

```toml
# Sandboxed WASM tool execution via wasmtime
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["wasm-tools"] }

# High-throughput batch embedding via a Mojo-compiled binary
oxide-agent = { git = "https://github.com/TheJagpreet/oxide-agent", features = ["mojo-interop"] }
```

---

## Features

### HTTP Client

`oxide_agent::client` provides the `OllamaClient` trait — a mockable async abstraction over any Ollama-compatible backend — and `HttpOllamaClient`, the production implementation backed by `reqwest`.

**Covered endpoints:**

| Method | Endpoint | Description |
|---|---|---|
| `generate` | `POST /api/generate` | Single-turn text completion |
| `chat` | `POST /api/chat` | Multi-turn chat, including tool/function calling |
| `embed` | `POST /api/embed` | Dense vector embeddings (single or batch) |
| `list_models` | `GET /api/tags` | List models available on the server |
| `stream_generate` | `POST /api/generate` | Token-by-token streaming completion |
| `stream_chat` | `POST /api/chat` | Token-by-token streaming chat |

Because `OllamaClient` is a trait, the entire stack is testable with an in-process mock — no running Ollama server required. See [examples/basic_chat.rs](examples/basic_chat.rs).

---

### Streaming

`stream_generate` and `stream_chat` use a zero-copy NDJSON pipeline:

- `bytes_stream()` from `reqwest` forwards `Bytes` chunks without re-allocation.
- `tokio_util::io::StreamReader` adapts the byte stream to `AsyncRead`.
- `FramedRead` + `LinesCodec` splits on `\n`, yielding one deserialized token per line.

Memory overhead is constant regardless of output length. See [examples/streaming.rs](examples/streaming.rs).

---

### Session Management

`oxide_agent::session::Session` is a stateful multi-turn conversation manager. It automatically:

- Appends every user message and assistant reply to the history.
- Estimates the token count (1 token ≈ 4 UTF-8 characters).
- Triggers context compression when history exceeds `max_tokens × compression_threshold`.

**Compression strategies:**

| Strategy | Behaviour |
|---|---|
| `TruncateOldest` *(default)* | Drops the oldest non-system messages until under budget |
| `Summarize { model }` | Asks Ollama to summarise the oldest half of history into a compact system message, then discards the originals |

**`SessionConfig` fields:**

| Field | Default | Description |
|---|---|---|
| `max_tokens` | `8_000` | Soft token budget for the message history |
| `compression_threshold` | `0.80` | Fraction of budget at which compression triggers |
| `compression_strategy` | `TruncateOldest` | Strategy applied when threshold is crossed |

See [examples/session.rs](examples/session.rs).

---

### Tool Calling

`oxide_agent::tools` provides `ToolRegistry` for registering async handler functions and dispatching tool calls returned by the model, and `ToolBuilder` for constructing JSON Schema tool definitions without boilerplate.

**`ToolBuilder` parameter types:**

| Method | JSON Schema type |
|---|---|
| `string_param(name, desc, required)` | `"string"` |
| `number_param(name, desc, required)` | `"number"` |
| `bool_param(name, desc, required)` | `"boolean"` |

The full agentic loop — register → attach definitions → detect tool calls → dispatch → feed results back → final answer — is covered in [examples/tool_calling.rs](examples/tool_calling.rs).

---

### Retrieval-Augmented Generation

`oxide_agent::rag::VectorStore` is an in-memory vector store using Ollama's `/api/embed` endpoint and cosine similarity for nearest-neighbour search.

- `add_text(text, metadata)` — embed a single string and store it.
- `add_file(path)` — read a UTF-8 file, chunk by `\n\n` (paragraph), store each chunk with `source` and `chunk` metadata.
- `query(question, top_k)` — return `Vec<SearchResult>` ranked by cosine similarity.

No external database is required; the store is backed by a `Vec<Document>` in process memory. See [examples/rag.rs](examples/rag.rs).

---

### WASM Tool Isolation

> Requires: `features = ["wasm-tools"]`

`oxide_agent::wasm::WasmTool` loads and executes tool logic from `.wasm` binaries via [wasmtime](https://wasmtime.dev/). Each tool runs in a fully sandboxed environment with no access to the host filesystem, network, or process state.

**ABI contract for the `.wasm` module:**

| Export | Signature | Responsibility |
|---|---|---|
| `memory` | linear memory | Default memory export |
| `alloc` | `(size: i32) -> i32` | Allocate `size` bytes, return pointer |
| `tool_call` | `(ptr: i32, len: i32) -> i32` | Receive JSON args, return pointer to null-terminated JSON result |

When the `wasm-tools` feature is disabled, `WasmTool::from_file` returns a descriptive `Err` pointing to the feature flag. See [examples/wasm_tool.rs](examples/wasm_tool.rs).

---

### Mojo Interoperability

> Requires: `features = ["mojo-interop"]`

`oxide_agent::mojo::MojoEmbedder` offloads batch embedding to an external [Mojo](https://www.modular.com/mojo)-compiled binary via a JSON line protocol over stdin/stdout.

**Protocol:**

```
stdin  → {"texts": ["hello", "world"]}
stdout ← {"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]}
```

Useful when a compiled Mojo kernel outperforms the Ollama HTTP round-trip for your embedding workload, or when you need to embed very large corpora in batch. See [examples/mojo_embedder.rs](examples/mojo_embedder.rs).

---

## Error Handling

All fallible operations return `Result<_, OxideError>`:

| Variant | Cause |
|---|---|
| `OxideError::Http(reqwest::Error)` | Network or transport failure |
| `OxideError::ApiError(status, body)` | Non-2xx response from Ollama |
| `OxideError::Serde(serde_json::Error)` | JSON parse failure |
| `OxideError::Other(String)` | All other errors (WASM, Mojo, tool dispatch) |

`OxideError` implements `std::error::Error` and integrates with `anyhow` and `thiserror`.

---

## Architecture

```
oxide-agent/
├── src/
│   ├── client/     # OllamaClient trait + HttpOllamaClient (reqwest + NDJSON streaming)
│   ├── types/      # Shared request/response structs (Generate, Chat, Embed, Tags)
│   ├── session/    # Session: multi-turn history, token estimation, compression
│   ├── tools/      # ToolRegistry, ToolBuilder, async dispatch
│   ├── rag/        # VectorStore: cosine similarity, add_text, add_file, query
│   ├── wasm/       # WasmTool: wasmtime sandbox (feature: wasm-tools)
│   ├── mojo/       # MojoEmbedder: subprocess bridge (feature: mojo-interop)
│   └── error.rs    # OxideError unified error type
├── oxide-agent-macros/   # Procedural macro crate (proc-macro2, quote, syn)
└── examples/       # Runnable examples — see examples/README.md
```

The `OllamaClient` trait is the central seam of the codebase. Every subsystem — `Session`, `VectorStore`, `ToolRegistry` — accepts `Arc<dyn OllamaClient>`, meaning the entire stack is testable with an in-process mock and no Ollama server required.

---

## Examples

Runnable, self-contained examples live in [`examples/`](examples/). Each file maps to one feature area:

| Example | Description | Feature flag |
|---|---|---|
| [basic_chat.rs]examples/basic_chat.rs | Single-turn chat and model listing ||
| [streaming.rs]examples/streaming.rs | Zero-copy token-by-token streaming ||
| [session.rs]examples/session.rs | Multi-turn session with compression strategies ||
| [tool_calling.rs]examples/tool_calling.rs | Full agentic tool-call loop ||
| [rag.rs]examples/rag.rs | Vector store indexing and RAG-pattern chat ||
| [wasm_tool.rs]examples/wasm_tool.rs | Sandboxed WASM tool execution | `wasm-tools` |
| [mojo_embedder.rs]examples/mojo_embedder.rs | Batch embedding via Mojo subprocess | `mojo-interop` |

See [examples/README.md](examples/README.md) for detailed walkthroughs, protocol documentation, and configuration references for each example.

---

## Testing

The test suite runs entirely without a live Ollama server by implementing `OllamaClient` on lightweight in-process structs:

```bash
cargo test
cargo test --features wasm-tools
cargo test --features mojo-interop
```

Notable test patterns used across the codebase:

- **`MockOllamaClient`** in `client` — verifies streaming chunk ordering and buffered response identity.
- **`EchoClient`** in `session` — echoes user messages back to validate history tracking, system prompt placement, and truncation under configurable token budgets.
- **`FakeEmbedClient`** in `rag` — returns deterministic embeddings based on input character values to verify cosine similarity ranking without a real embedding model.
- **Stub guards** in `wasm` and `mojo` — assert that feature-gated code returns helpful errors with feature flag names when called without the feature enabled.

---

## Contributing

All PRs should include unit tests. Because `OllamaClient` is a trait, mocking is straightforward — see the existing test modules for patterns. Run `cargo clippy` and `cargo fmt` before opening a PR.

Issues and feature requests: [github.com/TheJagpreet/oxide-agent/issues](https://github.com/TheJagpreet/oxide-agent/issues)

---

## License

MIT