Ambi

Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.

Dual‑engine architecture – Seamlessly switch between local inference (via llama.cpp with GPU acceleration) and cloud APIs (OpenAI‑compatible endpoints) without changing your agent code.
Advanced tool system – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema generation from Rust structs.
Intelligent context management – Safe eviction algorithm that preserves conversation logic, preventing token overflow while keeping your agent focused.
Rust native – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.

Resources

The best way to learn Ambi is to write an agent. The examples/ directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and multi‑tool parallel execution.

Installation

Add this to your Cargo.toml:

[dependencies]

ambi = "0.2"

For cloud‑only usage (faster compilation, no llama.cpp dependency):

ambi = { version = "0.2", default-features = false, features = ["openai-api"] }

Runtime Requirements

Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with rt-multi-thread enabled. Without this, Agent::make and all async methods will not function.

Quick start

use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig, Message};
use std::sync::{Arc, Mutex};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Pick an engine configuration
    let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        base_url: "https://api.openai.com/v1".into(),
        model_name: "gpt-4o".into(),
        temp: 0.7,
        top_p: 0.95,
    });

    // 2. Build an agent (5 lines of code!)
    let agent = Agent::make(config).await?
        .preamble("You are a helpful assistant.")
        .template(ambi::ChatTemplateType::Chatml);

    // 3. Create a shared state
    let state = Arc::new(Mutex::new(AgentState::new()));

    // 4. Run the chat pipeline
    let runner = ChatRunner;
    let response = runner.chat(&agent, &state, "Hello, world!").await?;
    println!("{}", response);

    Ok(())
}

Using local inference

Enable the llama-cpp feature and optionally a GPU backend:

ambi = { version = "0.2", features = ["llama-cpp", "cuda"] }

Then swap the engine configuration:

let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens:  4096,
buffer_size: 32,
use_gpu:     true,
n_gpu_layers: 100,
n_ctx:       8192,
n_tokens:    512,
n_seq_max:   1,
penalty_last_n:   64,
penalty_repeat:   1.1,
penalty_freq:     0.0,
penalty_present:  0.0,
temp:       0.7,
top_p:      0.9,
seed:       42,
min_keep:   1,
});

Adding custom tools

Define a tool by implementing the Tool trait. Ambi automatically generates the JSON Schema for you.

use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;

#[derive(Deserialize)]
struct WeatherArgs {
    city: String,
}

#[derive(Serialize)]
struct WeatherResult {
    temperature: f64,
    condition: String,
}

struct WeatherTool;

#[async_trait]
impl Tool for WeatherTool {
    const NAME: &'static str = "get_weather";

    type Args = WeatherArgs;
    type Output = WeatherResult;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get current weather for a city".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }),
            timeout_secs: Some(10),
            max_retries: Some(2),
            is_idempotent: true,
        }
    }

    async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
        // Your implementation here
        Ok(WeatherResult {
            temperature: 22.5,
            condition: "Sunny".into(),
        })
    }
}

Attach the tool to your agent:

let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;

Now the agent can seamlessly invoke get_weather when the user asks about the weather. Ambi handles retries, timeouts, and parallel execution automatically.

Streaming responses

use futures::StreamExt;

let mut stream = runner.chat_stream( & agent, & state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
match chunk {
Ok(text) => print ! ("{}", text),
Err(e) => eprintln !("Stream error: {}", e),
}
}

Context eviction

Ambi’s context management automatically evicts old messages when the token budget is exceeded, but you can fine‑tune the strategy:

let agent = Agent::make(config).await?
.with_eviction_strategy(
2,    // keep at least first 2 messages
6,    // keep at least most recent 6 messages
3000, // max safe tokens before eviction
);

You can also register a callback to process evicted messages (e.g., to persist them):

let agent = Agent::make(config).await?
.on_evict( | evicted| {
for msg in evicted {
// archive or log the message
}
});

Custom tool‑call parser

By default Ambi uses [TOOL_CALL] ... [/TOOL_CALL] tags. You can bring your own parser:

use ambi::tool::{ToolCallParser, DefaultToolParser};

struct MyParser;

impl ToolCallParser for MyParser {
    fn format_instruction(&self, tools_json: &str) -> String {
        // instruct the model how to call tools
        format!("Use tools: {}", tools_json)
    }

    fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
        // extract tool calls from the model's output
        vec![]
    }

    fn create_stream_formatter(&self) -> Box<dyn ambi::tool::StreamFormatter> {
        Box::new(ambi::agent::core::formatter::PassThroughFormatter)
    }
}

let agent = Agent::make(config).await?
.with_tool_parser(MyParser);

Error handling

Ambi uses thiserror to provide clear, actionable error types:

pub enum AmbiError {
    EngineError(String),
    AgentError(String),
    ToolError(String),
    ContextError(String),
    PipelineError(String),
    MaxIterationsReached(usize),
    Other(anyhow::Error),
}

All public APIs return Result<T, AmbiError>, making it easy to pattern‑match or propagate errors.

Testing

Ambi comes with comprehensive unit and integration tests. We recommend using cargo test during development. When testing agents, consider using a mock engine to avoid real API calls:

struct MockEngine;
#[async_trait]
impl LLMEngineTrait for MockEngine {
    async fn chat(&self, _: LLMRequest) -> Result<String> {
        Ok("Hello, I am a mock.".into())
    }
    // ...
}

let agent = Agent::with_custom_engine(Box::new(MockEngine)) ?;

Feature flags

Ambi uses Cargo features to keep compile times low:

openai-api (enabled by default) – OpenAI‑compatible cloud backend powered by async-openai.
llama-cpp – Local inference via llama.cpp (supports cuda, vulkan, metal, rocm sub‑features).
cuda, vulkan, metal, rocm – GPU acceleration for the local engine (choose exactly one).

ambi 0.3.1