ambi 0.3.5

A flexible, multi-backend, customizable AI agent framework, entirely based on Rust.
Documentation
# Ambi


---

<p align="center">
  <a href="https://spdx.org/licenses/Apache-2.0.html"><img src="https://img.shields.io/badge/License-Apache%202.0-blue" alt="License: GPL v3"></a>
  <a href="https://github.com/maskviva/ambi"><img alt="github" src="https://img.shields.io/badge/github-maskviva/Ambi-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20"></a>
  <a href="https://crates.io/crates/ambi"><img alt="crates.io" src="https://img.shields.io/crates/v/ambi.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20"></a>    
  <a href="https://docs.rs/ambi"><img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-ambi-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20"></a><br/>
 [<a href="./README_zh.md">中文(简体)</a>] | [<a href="./README.md">English</a>] 
</p>

Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create
production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.

- **Dual‑engine architecture** – Seamlessly switch between local inference (via `llama.cpp` with GPU acceleration) and
  cloud APIs (OpenAI‑compatible endpoints) without changing your agent code.
- **Advanced tool system** – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema
  generation from Rust structs.
- **Intelligent context management** – Safe eviction algorithm that preserves conversation logic, preventing token
  overflow while keeping your agent focused.
- **Rust native** – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.

<br>

## Resources


The best way to learn Ambi is to write an agent. The [`examples/`](https://github.com/maskviva/ambi/tree/main/examples)
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.

<br>

## Installation


Add this to your `Cargo.toml`:

```toml
[dependencies]
ambi = "0.3"
```

For cloud‑only usage (faster compilation, no `llama.cpp` dependency):

```toml
ambi = { version = "0.3", default-features = false, features = ["openai-api"] }
```

## Runtime Requirements


Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with `rt-multi-thread` enabled. Without this,
`Agent::make` and all async methods will not function.

<br>

## Quick start


```rust
use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig};
use std::sync::Arc;
use tokio::sync::RwLock;

#[tokio::main]

async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Pick an engine configuration
    let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        base_url: "https://api.openai.com/v1".into(),
        model_name: "gpt-4o".into(),
        temp: 0.7,
        top_p: 0.95,
    });

    // 2. Build an agent
    let agent = Agent::make(config).await?
        .preamble("You are a helpful assistant.")
        .template(ambi::ChatTemplateType::Chatml);

    // 3. Create a shared state with a unique session ID
    let state = Arc::new(RwLock::new(AgentState::new("session-001")));

    // 4. Run the chat pipeline
    let runner = ChatRunner::default();
    let response = runner.chat(&agent, &state, "Hello, world!").await?;
    println!("{}", response);

    Ok(())
}
```

<br>

## Using local inference


Enable the `llama-cpp` feature and optionally a GPU backend:

```toml
ambi = { version = "0.3", features = ["llama-cpp", "cuda"] }
```

Then swap the engine configuration:

```rust
let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens:  4096,
buffer_size: 32,
use_gpu:     true,
n_gpu_layers: 100,
n_ctx:       8192,
n_tokens:    512,
n_seq_max:   1,
penalty_last_n:   64,
penalty_repeat:   1.1,
penalty_freq:     0.0,
penalty_present:  0.0,
temp:       0.7,
top_p:      0.9,
seed:       42,
min_keep:   1,
});
```

<br>

## Adding custom tools


Define a tool by implementing the `Tool` trait. Ambi automatically generates the JSON Schema for you.

```rust
use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;

#[derive(Deserialize)]

struct WeatherArgs {
    city: String,
}

#[derive(Serialize)]

struct WeatherResult {
    temperature: f64,
    condition: String,
}

struct WeatherTool;

#[async_trait]

impl Tool for WeatherTool {
    const NAME: &'static str = "get_weather";

    type Args = WeatherArgs;
    type Output = WeatherResult;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get current weather for a city".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }),
            timeout_secs: Some(10),
            max_retries: Some(2),
            is_idempotent: true,
        }
    }

    async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
        // Your implementation here
        Ok(WeatherResult {
            temperature: 22.5,
            condition: "Sunny".into(),
        })
    }
}
```

Attach the tool to your agent:

```rust
let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;
```

Now the agent can seamlessly invoke `get_weather` when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.

<br>

## Streaming responses


```rust
use futures::StreamExt;

let mut stream = runner.chat_stream(&agent, &state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
    match chunk {
        Ok(text) => print!("{}", text),
        Err(e) => eprintln!("Stream error: {}", e),
    }
}
```

WASM targets (browser) support the same streaming API natively via `fetch` and `ReadableStream` – see
[`examples/webAssembly`](https://github.com/maskviva/ambi/tree/main/examples/webAssembly) for a live demo.

<br>

## Context eviction & dynamic context


Ambi's context management automatically evicts old messages when the token budget is exceeded, while completely
decoupling system instructions from the eviction FIFO queue for maximum KV Cache hit rates.

### Dynamic context (RAG / session data)


Volatile background knowledge like RAG results or environment variables can be injected safely into `AgentState`
without touching the static `system_prompt`:

```rust
// Inject RAG results for the current turn
state.write().await.set_dynamic_context("Relevant docs: ...");
// Or stack multiple sources
state.write().await.append_dynamic_context("Current time: 2025-01-01");
```

Use `clear_dynamic_context()` to reset between turns.

### Eviction strategy


```rust
use ambi::config::EvictionStrategy;

let agent = Agent::make(config).await?
    .with_eviction_strategy(EvictionStrategy { max_safe_tokens: 4096 });
```

### Eviction callback with state access


The callback now receives `&AgentState`, giving you safe access to identifiers and connection pools from state
extensions for async database archiving:

```rust
let agent = Agent::make(config).await?
    .on_evict(|state: &AgentState, evicted: Vec<Arc<Message>>| {
        let session_id = &state.session_id;
        // Spawn an async task to archive evicted messages
        tokio::spawn(async move {
            // persist evicted messages to DB
        });
    });
```

### ChatHistory helpers


```rust
// Find messages containing a keyword
let results = state.read().await.chat_history.search_by_keyword("weather");

// Get the last user message
if let Some(msg) = state.read().await.chat_history.last_user_message() {
    // inspect the user's latest intent
}

// Get the last assistant message
if let Some(msg) = state.read().await.chat_history.last_assistant_message() {
    // inspect the latest response
}
```

<br>

## Custom tool‑call parser


By default Ambi uses `[TOOL_CALL] ... [/TOOL_CALL]` tags. You can bring your own parser:

```rust
use ambi::tool::{ToolCallParser, DefaultToolParser};
use ambi::types::StreamFormatter;

struct MyParser;

impl ToolCallParser for MyParser {
    fn format_instruction(&self, tools_json: &str) -> String {
        // instruct the model how to call tools
        format!("Use tools: {}", tools_json)
    }

    fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
        // extract tool calls from the model's output
        vec![]
    }

    fn create_stream_formatter(&self) -> Box<dyn StreamFormatter> {
        Box::new(ambi::agent::processor::PassThroughFormatter)
    }
}

let agent = Agent::make(config).await?
    .with_tool_parser(MyParser);
```

<br>

## Error handling


Ambi uses `thiserror` to provide clear, actionable error types:

```rust
pub enum AmbiError {
    EngineError(String),
    AgentError(String),
    ToolError(String),
    ContextError(String),
    PipelineError(String),
    MaxIterationsReached(usize),
    Other(anyhow::Error),
}
```

All public APIs return `Result<T, AmbiError>`, making it easy to pattern‑match or propagate errors.

<br>

## Testing


Ambi comes with comprehensive unit and integration tests. We recommend using `cargo test` during development. When
testing agents, consider using a mock engine to avoid real API calls:

```rust
struct MockEngine;
#[async_trait]

impl LLMEngineTrait for MockEngine {
    async fn chat(&self, _: LLMRequest) -> Result<String> {
        Ok("Hello, I am a mock.".into())
    }
    // ...
}

let agent = Agent::make(LLMEngineConfig::Custom(Box::new(MockEngine))).await?;
```

<br>

## Feature flags


Ambi uses Cargo features to keep compile times low:

- **`openai-api`** *(enabled by default)* – OpenAI‑compatible cloud backend powered by `async-openai`.
- **`llama-cpp`** – Local inference via `llama.cpp` (supports `cuda`, `vulkan`, `metal`, `rocm` sub‑features).
- **`cuda`**, **`vulkan`**, **`metal`**, **`rocm`** – GPU acceleration for the local engine (choose exactly one).
- **`macro`** – Enables `#[tool]` attribute macro for zero-boilerplate tool definitions with `params(...)` support.
- **`mtmd`** – Multimodal (vision) support for local VLM models (implies `llama-cpp`).

<br>

#### License


<sup>
Licensed under the <a href="LICENSE-APACHE">Apache License, Version 2.0</a>.
</sup>

<br>

<sub>
Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
be licensed as above, without any additional terms or conditions.
</sub>