ambi 0.2.3 - Docs.rs

# Ambi


---
[](https://github.com/maskviva/ambi)
[](https://crates.io/crates/ambi)
[](https://docs.rs/ambi)

<p align="center">
  <a href="https://spdx.org/licenses/Apache-2.0.html"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License: GPL v3"></a>
  <a href="https://github.com/maskviva/ambi"><img alt="github" src="https://img.shields.io/badge/github-maskviva/Ambi-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20"></a>
  <a href="https://crates.io/crates/ambi"><img alt="crates.io" src="https://img.shields.io/crates/v/ambi.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20"></a>    
  <a href="https://docs.rs/ambi"><img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-ambi-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20"></a><br/>
 [<a href="./README_zh.md">中文(简体)</a>] | [<a href="./README.md">English</a>] 
</p>

Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create
production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.

- **Dual‑engine architecture** – Seamlessly switch between local inference (via `llama.cpp` with GPU acceleration) and
  cloud APIs (OpenAI‑compatible endpoints) without changing your agent code.
- **Advanced tool system** – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema
  generation from Rust structs.
- **Intelligent context management** – Safe eviction algorithm that preserves conversation logic, preventing token
  overflow while keeping your agent focused.
- **Rust native** – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.

<br>

## Resources


The best way to learn Ambi is to write an agent. The [`examples/`](https://github.com/maskviva/ambi/tree/main/examples)
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.

<br>

## Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
ambi = "0.2"
```

For cloud‑only usage (faster compilation, no `llama.cpp` dependency):

```toml
ambi = { version = "0.2", default-features = false, features = ["openai-api"] }
```

<br>

## Quick start


```rust
use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig, Message};
use std::sync::{Arc, Mutex};

#[tokio::main]

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Pick an engine configuration
    let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        base_url: "https://api.openai.com/v1".into(),
        model_name: "gpt-4o".into(),
        temp: 0.7,
        top_p: 0.95,
    });

    // 2. Build an agent (5 lines of code!)
    let agent = Agent::make(config).await?
        .preamble("You are a helpful assistant.")
        .template(ambi::ChatTemplateType::Chatml);

    // 3. Create a shared state
    let state = Arc::new(Mutex::new(AgentState::new()));

    // 4. Run the chat pipeline
    let runner = ChatRunner;
    let response = runner.chat(&agent, &state, "Hello, world!").await?;
    println!("{}", response);

    Ok(())
}
```

<br>

## Using local inference


Enable the `llama-cpp` feature and optionally a GPU backend:

```toml
ambi = { version = "0.2", features = ["llama-cpp", "cuda"] }
```

Then swap the engine configuration:

```rust
let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens:  4096,
buffer_size: 32,
use_gpu:     true,
n_gpu_layers: 100,
n_ctx:       8192,
n_tokens:    512,
n_seq_max:   1,
penalty_last_n:   64,
penalty_repeat:   1.1,
penalty_freq:     0.0,
penalty_present:  0.0,
temp:       0.7,
top_p:      0.9,
seed:       42,
min_keep:   1,
});
```

<br>

## Adding custom tools


Define a tool by implementing the `Tool` trait. Ambi automatically generates the JSON Schema for you.

```rust
use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;

#[derive(Deserialize)]

struct WeatherArgs {
    city: String,
}

#[derive(Serialize)]

struct WeatherResult {
    temperature: f64,
    condition: String,
}

struct WeatherTool;

#[async_trait]

impl Tool for WeatherTool {
    const NAME: &'static str = "get_weather";

    type Args = WeatherArgs;
    type Output = WeatherResult;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get current weather for a city".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }),
            timeout_secs: Some(10),
            max_retries: Some(2),
            is_idempotent: true,
        }
    }

    async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
        // Your implementation here
        Ok(WeatherResult {
            temperature: 22.5,
            condition: "Sunny".into(),
        })
    }
}
```

Attach the tool to your agent:

```rust
let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;
```

Now the agent can seamlessly invoke `get_weather` when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.

<br>

## Streaming responses


```rust
use futures::StreamExt;

let mut stream = runner.chat_stream( & agent, & state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
match chunk {
Ok(text) => print ! ("{}", text),
Err(e) => eprintln !("Stream error: {}", e),
}
}
```

<br>

## Context eviction


Ambi’s context management automatically evicts old messages when the token budget is exceeded, but you can fine‑tune the
strategy:

```rust
let agent = Agent::make(config).await?
.with_eviction_strategy(
2,    // keep at least first 2 messages
6,    // keep at least most recent 6 messages
3000, // max safe tokens before eviction
);
```

You can also register a callback to process evicted messages (e.g., to persist them):

```rust
let agent = Agent::make(config).await?
.on_evict( | evicted| {
for msg in evicted {
// archive or log the message
}
});
```

<br>

## Custom tool‑call parser


By default Ambi uses `[TOOL_CALL] ... [/TOOL_CALL]` tags. You can bring your own parser:

```rust
use ambi::tool::{ToolCallParser, DefaultToolParser};

struct MyParser;

impl ToolCallParser for MyParser {
    fn format_instruction(&self, tools_json: &str) -> String {
        // instruct the model how to call tools
        format!("Use tools: {}", tools_json)
    }

    fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
        // extract tool calls from the model's output
        vec![]
    }

    fn create_stream_formatter(&self) -> Box<dyn ambi::tool::StreamFormatter> {
        Box::new(ambi::agent::core::formatter::PassThroughFormatter)
    }
}

let agent = Agent::make(config).await?
.with_tool_parser(MyParser);
```

<br>

## Error handling


Ambi uses `thiserror` to provide clear, actionable error types:

```rust
pub enum AmbiError {
    EngineError(String),
    AgentError(String),
    ToolError(String),
    ContextError(String),
    PipelineError(String),
    MaxIterationsReached(usize),
    Other(anyhow::Error),
}
```

All public APIs return `Result<T, AmbiError>`, making it easy to pattern‑match or propagate errors.

<br>

## Testing


Ambi comes with comprehensive unit and integration tests. We recommend using `cargo test` during development. When
testing agents, consider using a mock engine to avoid real API calls:

```rust
struct MockEngine;
#[async_trait]

impl LLMEngineTrait for MockEngine {
    async fn chat(&self, _: LLMRequest) -> Result<String> {
        Ok("Hello, I am a mock.".into())
    }
    // ...
}

let agent = Agent::with_custom_engine(Box::new(MockEngine)) ?;
```

<br>

## Feature flags


Ambi uses Cargo features to keep compile times low:

- **`openai-api`** *(enabled by default)* – OpenAI‑compatible cloud backend powered by `async-openai`.
- **`llama-cpp`** – Local inference via `llama.cpp` (supports `cuda`, `vulkan`, `metal`, `rocm` sub‑features).
- **`cuda`**, **`vulkan`**, **`metal`**, **`rocm`** – GPU acceleration for the local engine (choose exactly one).

<br>

#### License


<sup>
Licensed under the <a href="LICENSE-APACHE">Apache License, Version 2.0</a>.
</sup>

<br>

<sub>
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
be licensed as above, without any additional terms or conditions.
</sub>