oxide-agent 0.1.0

Type-safe, high-performance Rust crate for building agentic systems on Ollama
Documentation
# oxide-agent — Examples

This directory contains runnable examples covering every major feature of `oxide-agent`. Each file is a self-contained binary that can be launched with `cargo run --example <name>`.

**Prerequisites:** Ollama running on `http://localhost:11434`. Pull the models referenced in each example beforehand with `ollama pull <model>`.

---

## Index

| Example | File | Feature flag |
|---|---|---|
| [Basic Chat]#basic-chat | [basic_chat.rs]basic_chat.rs ||
| [Streaming]#streaming | [streaming.rs]streaming.rs ||
| [Session Management]#session-management | [session.rs]session.rs ||
| [Tool Calling]#tool-calling | [tool_calling.rs]tool_calling.rs ||
| [Retrieval-Augmented Generation]#retrieval-augmented-generation | [rag.rs]rag.rs ||
| [WASM Tool Isolation]#wasm-tool-isolation | [wasm_tool.rs]wasm_tool.rs | `wasm-tools` |
| [Mojo Interoperability]#mojo-interoperability | [mojo_embedder.rs]mojo_embedder.rs | `mojo-interop` |

---

## Basic Chat

**File:** [basic_chat.rs](basic_chat.rs)  
**Run:** `cargo run --example basic_chat`

Demonstrates the lowest-level entry point: constructing `HttpOllamaClient` directly and sending a single `ChatRequest`.

```
cargo run --example basic_chat
```

### What it covers

- Constructing `HttpOllamaClient` with a custom base URL.
- Building a `ChatRequest` with a `User`-role `Message`.
- Calling `client.chat(req).await` and reading `response.message.content`.
- Calling `client.list_models().await` to enumerate models on the server.

### Key types

| Type | Module | Role |
|---|---|---|
| `HttpOllamaClient` | `oxide_agent::client` | Production HTTP client |
| `OllamaClient` | `oxide_agent::client` | Mockable async trait — swap for tests |
| `ChatRequest` | `oxide_agent::types` | Payload sent to `/api/chat` |
| `Message` | `oxide_agent::types` | A single turn with a `Role` and `content` |
| `Role` | `oxide_agent::types` | `System`, `User`, `Assistant`, `Tool` |

### When to use the raw client vs `Session`

Use `HttpOllamaClient` directly when you manage conversation history yourself or need one-shot completions. For multi-turn conversations with automatic history tracking and context compression, prefer [`Session`](#session-management).

---

## Streaming

**File:** [streaming.rs](streaming.rs)  
**Run:** `cargo run --example streaming`

Prints tokens to stdout as they arrive without buffering the full response, using `stream_chat`.

### What it covers

- Calling `client.stream_chat(req)` which returns `BoxStream<ChatResponse>`.
- Consuming the stream with `futures_util::StreamExt::next()` in an async loop.
- Detecting the terminal chunk via `chunk.done == true`.
- Why this is zero-copy: `reqwest` bytes chunks → `StreamReader``FramedRead + LinesCodec` → deserialized `ChatResponse` — no intermediate `String` allocation of the full body.

### When streaming matters

Streaming makes the UX feel responsive on long completions. For server-side use (batch jobs, pipelines where you need the full reply before acting), use the buffered `client.chat()` instead.

---

## Session Management

**File:** [session.rs](session.rs)  
**Run:** `cargo run --example session`

Shows how to hold a stateful multi-turn conversation without manually tracking the message array or worrying about context window limits.

### What it covers

- Creating a `Session` with a custom `SessionConfig`.
- Setting and replacing the system prompt via `session.set_system_prompt()`.
- Sending turns with `session.ask()` and observing the reply.
- Reading history length and estimated token count.
- Switching between the two built-in compression strategies.

### Compression strategies in depth

`Session` estimates token usage as `(char_count + 3) / 4` — a fast approximation that avoids a tokenizer dependency. When `estimated_tokens() > max_tokens × compression_threshold`, one of two strategies fires:

**`TruncateOldest`** (default)

Removes the oldest non-system messages one at a time until the estimated count drops below the threshold. System messages are preserved. Fastest option; loses early context entirely.

**`Summarize { model }`**

1. Collects the oldest half of non-system messages.
2. Sends them as a `User`-role prompt: _"Summarise the following conversation history concisely…"_ to the specified model.
3. Inserts the summary as a `System`-role message immediately after any existing system prompt.
4. Removes the original messages.

Use `Summarize` when retaining a compressed form of early context matters more than latency. You can point it at a smaller/faster model than the one driving the conversation.

### `SessionConfig` reference

| Field | Default | Description |
|---|---|---|
| `max_tokens` | `8_000` | Soft token budget for the message history |
| `compression_threshold` | `0.80` | Fraction of budget at which compression triggers |
| `compression_strategy` | `TruncateOldest` | Strategy applied when threshold is crossed |

---

## Tool Calling

**File:** [tool_calling.rs](tool_calling.rs)  
**Run:** `cargo run --example tool_calling`

Registers two async tools — a mock weather lookup and an arithmetic calculator — and drives a full agentic loop: send question → receive tool calls → dispatch handlers → feed results back → receive final answer.

### What it covers

- Building a `ToolDefinition` with `ToolBuilder` (no JSON Schema boilerplate).
- Registering async handler closures via `ToolRegistry::register`.
- Attaching `registry.definitions()` to a `ChatRequest`.
- Detecting `tool_calls` in the model's response (`Option<Vec<ToolCall>>`).
- Calling `registry.dispatch(name, args)` to execute the handler.
- Appending `Role::Tool` messages and continuing the conversation.

### The agentic loop

```
User message
ChatRequest (tools attached)
Model response
    ├── tool_calls present? ──► dispatch each ──► Tool messages
    │                                                  │
    │                                                  ▼
    │                                         ChatRequest (with results)
    │                                                  │
    │                                                  ▼
    └── no tool calls ──────────────────────► Final answer
```

### `ToolBuilder` parameter types

| Method | JSON Schema `type` |
|---|---|
| `string_param(name, desc, required)` | `"string"` |
| `number_param(name, desc, required)` | `"number"` |
| `bool_param(name, desc, required)` | `"boolean"` |

For more complex schemas (arrays, nested objects) you can construct `ToolDefinition` directly using the `types` module.

---

## Retrieval-Augmented Generation

**File:** [rag.rs](rag.rs)  
**Run:** `cargo run --example rag`

Builds an in-memory vector store, indexes text snippets and a file, queries for the top-k most relevant documents, then injects the retrieved context into a chat prompt.

### What it covers

- Constructing `VectorStore` with an embedding model name.
- Adding individual strings via `store.add_text(text, metadata)`.
- Indexing a UTF-8 file via `store.add_file(path)` — each `\n\n`-delimited paragraph becomes a separate document; `source` and `chunk` index are stored as metadata.
- Querying with `store.query(question, top_k)` — returns `Vec<SearchResult>` sorted by cosine similarity (highest first).
- Constructing a `System`-role prompt from retrieved context (the standard RAG pattern).

### Cosine similarity

Similarity is computed as:

```
sim(a, b) = (a · b) / (‖a‖ × ‖b‖)
```

Values range from `−1` (opposite) to `1` (identical direction). The store returns `0.0` for zero-length vectors.

### Choosing an embedding model

Any model exposed by your Ollama server's `/api/embed` endpoint works. Recommended options:

| Model | Dimensions | Notes |
|---|---|---|
| `nomic-embed-text` | 768 | Good general-purpose baseline |
| `mxbai-embed-large` | 1024 | Higher quality, slower |
| `all-minilm` | 384 | Fast and lightweight |

For high-throughput or large corpora, consider the [Mojo bridge](#mojo-interoperability) to offload batch embedding outside the Ollama HTTP layer.

---

## WASM Tool Isolation

**File:** [wasm_tool.rs](wasm_tool.rs)  
**Run:** `cargo run --example wasm_tool --features wasm-tools`

Loads a `.wasm` binary and calls it with a JSON argument object. The module runs in a `wasmtime` sandbox — no host filesystem, network, or process access.

### What it covers

- Building a `ToolDefinition` with `wasm_tool_definition()`.
- Loading a `.wasm` file via `WasmTool::from_file(path, definition)`.
- Invoking the tool with `tool.call(args: serde_json::Value)`.
- Graceful stub behavior when the `wasm-tools` feature is disabled.

### ABI contract

Your `.wasm` module must export:

| Export | Signature | Responsibility |
|---|---|---|
| `memory` | linear memory | Default memory export |
| `alloc` | `(size: i32) -> i32` | Allocate `size` bytes, return pointer |
| `tool_call` | `(ptr: i32, len: i32) -> i32` | Receive JSON args, return pointer to null-terminated JSON result |

The host writes the JSON-encoded argument object into the WASM linear memory at the pointer returned by `alloc`, then calls `tool_call`. The result pointer must point to a null-terminated UTF-8 string containing valid JSON.

### Security model

`wasmtime` enforces capability-based isolation by default. A sandboxed tool cannot:

- Open files on the host.
- Make network connections.
- Spawn processes.
- Access host memory outside its linear memory.

This makes WASM tools safe to load from untrusted sources or auto-generated code.

---

## Mojo Interoperability

**File:** [mojo_embedder.rs](mojo_embedder.rs)  
**Run:** `cargo run --example mojo_embedder --features mojo-interop`

Spawns a [Mojo](https://www.modular.com/mojo)-compiled binary as a subprocess and sends it a batch of strings to embed, receiving a `Vec<Vec<f32>>` back.

### What it covers

- Constructing `MojoEmbedder::new(binary_path)`.
- Calling `embedder.embed_batch(texts).await` — each call is one subprocess invocation.
- The stdin/stdout JSON line protocol.
- Graceful stub behavior when `mojo-interop` feature is disabled.

### JSON line protocol

```
# Request written to stdin (single line):
{"texts": ["first document", "second document"]}

# Response read from stdout (single line):
{"embeddings": [[0.12, 0.34, ...], [0.56, 0.78, ...]]}
```

The embedder:
1. Spawns the binary with `stdin`, `stdout`, and `stderr` piped.
2. Writes the request line and closes stdin.
3. Reads all of stdout.
4. Waits for process exit — non-zero status → `OxideError::Other`.
5. Parses the first non-empty line as `{"embeddings": ...}`.

### When to use Mojo over Ollama's `/api/embed`

- You have a Mojo kernel that runs faster than the Ollama HTTP round-trip for your embedding model.
- You need to embed very large corpora in batch and want to amortize subprocess overhead across many texts per call.
- You want to keep embedding logic compiled separately from the Rust binary (e.g., for model hot-swapping).