ambi 0.3.8

A flexible, multi-backend, customizable AI agent framework, entirely based on Rust.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# Ambi


---

<p align="center">
  <a href="https://spdx.org/licenses/Apache-2.0.html"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License: GPL v3"></a>
  <a href="https://github.com/maskviva/ambi"><img alt="github" src="https://img.shields.io/badge/github-maskviva/Ambi-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20"></a>
  <a href="https://crates.io/crates/ambi"><img alt="crates.io" src="https://img.shields.io/crates/v/ambi.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20"></a>    
  <a href="https://docs.rs/ambi"><img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-ambi-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20"></a><br/>
 [<a href="./README_zh.md">中文(简体)</a>] | [<a href="./README.md">English</a>] 
</p>

Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create
production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.

- **Dual‑engine architecture** – Seamlessly switch between local inference (via `llama.cpp` with GPU acceleration) and
  cloud APIs (OpenAI‑compatible endpoints) without changing your agent code.
- **Advanced tool system** – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema
  generation from Rust structs.
- **Intelligent context management** – Safe eviction algorithm that preserves conversation logic, preventing token
  overflow while keeping your agent focused.
- **Rust native** – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.

<br>

## Resources


The best way to learn Ambi is to write an agent. The [`examples/`](https://github.com/maskviva/ambi/tree/main/examples)
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.

<br>

## Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
ambi = "0.3"
```

For cloud‑only usage (faster compilation, no `llama.cpp` dependency):

```toml
ambi = { version = "0.3", default-features = false, features = ["openai-api"] }
```

## Runtime Requirements


Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with `rt-multi-thread` enabled. Without this,
`Agent::make` and all async methods will not function.

### Bindings


Ambi also provides native bindings for other languages:

**Python** – Install the pre-built wheel from PyPI:

```bash
pip install ambi-python
```

```python
from ambi import Agent, AgentState, Pipeline, LLMEngineConfig
```

**Node.js** – Install the npm package with prebuilt binaries:

```bash
npm install ambi-node
```

```js
const { Engine, Agent, AgentState, ChatRunner } = require('ambi-node');
```

> Prebuilt binaries are available for Windows, Linux (glibc & musl), and macOS on x64 & arm64 architectures.
> No Rust toolchain required on the consuming machine.

<br>

## Quick start


```rust
use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig};
use std::sync::Arc;
use tokio::sync::RwLock;

#[tokio::main]

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Pick an engine configuration
    let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        base_url: "https://api.openai.com/v1".into(),
        model_name: "gpt-4o".into(),
        temp: 0.7,
        top_p: 0.95,
    });

    // 2. Build an agent
    let agent = Agent::make(config).await?
        .preamble("You are a helpful assistant.")
        .template(ambi::ChatTemplateType::Chatml);

    // 3. Create a shared state with a unique session ID
    let state = Arc::new(RwLock::new(AgentState::new("session-001")));

    // 4. Run the chat pipeline
    let runner = ChatRunner::default();
    let response = runner.chat(&agent, &state, "Hello, world!").await?;
    println!("{}", response);

    Ok(())
}
```

<br>

## Using local inference


Enable the `llama-cpp` feature and optionally a GPU backend:

```toml
ambi = { version = "0.3", features = ["llama-cpp", "cuda"] }
```

Then swap the engine configuration:

```rust
let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens:  4096,
buffer_size: 32,
use_gpu:     true,
n_gpu_layers: 100,
n_ctx:       8192,
n_tokens:    512,
n_seq_max:   1,
penalty_last_n:   64,
penalty_repeat:   1.1,
penalty_freq:     0.0,
penalty_present:  0.0,
temp:       0.7,
top_p:      0.9,
seed:       42,
min_keep:   1,
});
```

<br>

## Adding custom tools


Define a tool by implementing the `Tool` trait. Ambi automatically generates the JSON Schema for you.

```rust
use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;

#[derive(Deserialize)]

struct WeatherArgs {
    city: String,
}

#[derive(Serialize)]

struct WeatherResult {
    temperature: f64,
    condition: String,
}

struct WeatherTool;

#[async_trait]

impl Tool for WeatherTool {
    const NAME: &'static str = "get_weather";

    type Args = WeatherArgs;
    type Output = WeatherResult;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get current weather for a city".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }),
            timeout_secs: Some(10),
            max_retries: Some(2),
            is_idempotent: true,
        }
    }

    async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
        // Your implementation here
        Ok(WeatherResult {
            temperature: 22.5,
            condition: "Sunny".into(),
        })
    }
}
```

Attach the tool to your agent:

```rust
let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;
```

Now the agent can seamlessly invoke `get_weather` when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.

<br>

## Streaming responses


```rust
use futures::StreamExt;

let mut stream = runner.chat_stream(&agent, &state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
    match chunk {
        Ok(text) => print!("{}", text),
        Err(e) => eprintln!("Stream error: {}", e),
    }
}
```

WASM targets (browser) support the same streaming API natively via `fetch` and `ReadableStream` – see
[`examples/webAssembly`](https://github.com/maskviva/ambi/tree/main/examples/webAssembly) for a live demo.

<br>

## Context eviction & dynamic context


Ambi's context management automatically evicts old messages when the token budget is exceeded, while completely
decoupling system instructions from the eviction FIFO queue for maximum KV Cache hit rates.

### Dynamic context (RAG / session data)


Volatile background knowledge like RAG results or environment variables can be injected safely into `AgentState`
without touching the static `system_prompt`:

```rust
// Inject RAG results for the current turn
state.write().await.set_dynamic_context("Relevant docs: ...");
// Or stack multiple sources
state.write().await.append_dynamic_context("Current time: 2025-01-01");
```

Use `clear_dynamic_context()` to reset between turns.

### Eviction strategy


```rust
use ambi::config::EvictionStrategy;

let agent = Agent::make(config).await?
    .with_eviction_strategy(EvictionStrategy { max_safe_tokens: 4096 });
```

### Eviction callback with state access


The callback now receives `&AgentState`, giving you safe access to identifiers and connection pools from state
extensions for async database archiving:

```rust
let agent = Agent::make(config).await?
    .on_evict(|state: &AgentState, evicted: Vec<Arc<Message>>| {
        let session_id = &state.session_id;
        // Spawn an async task to archive evicted messages
        tokio::spawn(async move {
            // persist evicted messages to DB
        });
    });
```

### ChatHistory helpers


```rust
// Find messages containing a keyword
let results = state.read().await.chat_history.search_by_keyword("weather");

// Get the last user message
if let Some(msg) = state.read().await.chat_history.last_user_message() {
    // inspect the user's latest intent
}

// Get the last assistant message
if let Some(msg) = state.read().await.chat_history.last_assistant_message() {
    // inspect the latest response
}
```

<br>

## Custom tool‑call parser


By default Ambi uses `[TOOL_CALL] ... [/TOOL_CALL]` tags. You can bring your own parser:

```rust
use ambi::tool::{ToolCallParser, DefaultToolParser};
use ambi::types::StreamFormatter;

struct MyParser;

impl ToolCallParser for MyParser {
    fn format_instruction(&self, tools_json: &str) -> String {
        // instruct the model how to call tools
        format!("Use tools: {}", tools_json)
    }

    fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
        // extract tool calls from the model's output
        vec![]
    }

    fn create_stream_formatter(&self) -> Box<dyn StreamFormatter> {
        Box::new(ambi::agent::processor::PassThroughFormatter)
    }
}

let agent = Agent::make(config).await?
    .with_tool_parser(MyParser);
```

<br>

## Error handling


Ambi uses `thiserror` to provide clear, actionable error types:

```rust
pub enum AmbiError {
    EngineError(String),
    AgentError(String),
    ToolError(String),
    ContextError(String),
    PipelineError(String),
    MaxIterationsReached(usize),
    Other(anyhow::Error),
}
```

All public APIs return `Result<T, AmbiError>`, making it easy to pattern‑match or propagate errors.

<br>

## Testing


Ambi comes with comprehensive unit and integration tests. We recommend using `cargo test` during development. When
testing agents, consider using a mock engine to avoid real API calls:

```rust
struct MockEngine;
#[async_trait]

impl LLMEngineTrait for MockEngine {
    async fn chat(&self, _: LLMRequest) -> Result<String> {
        Ok("Hello, I am a mock.".into())
    }
    // ...
}

let agent = Agent::make(LLMEngineConfig::Custom(Box::new(MockEngine))).await?;
```

<br>

## Feature flags


Ambi uses Cargo features to keep compile times low:

- **`openai-api`** *(enabled by default)* – OpenAI‑compatible cloud backend powered by `async-openai`.
- **`llama-cpp`** – Local inference via `llama.cpp` (supports `cuda`, `vulkan`, `metal`, `rocm` sub‑features).
- **`cuda`**, **`vulkan`**, **`metal`**, **`rocm`** – GPU acceleration for the local engine (choose exactly one).
- **`macro`** – Enables `#[tool]` attribute macro for zero-boilerplate tool definitions with `params(...)` support.
- **`mtmd`** – Multimodal (vision) support for local VLM models (implies `llama-cpp`).

<br>

#### License


<sup>
Licensed under the <a href="LICENSE-APACHE">Apache License, Version 2.0</a>.
</sup>

<br>

<sub>
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
be licensed as above, without any additional terms or conditions.
</sub>