Ambi

Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.

Dual‑engine architecture – Seamlessly switch between local inference (via llama.cpp with GPU acceleration) and cloud APIs (OpenAI‑compatible endpoints) without changing your agent code.
Advanced tool system – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema generation from Rust structs.
Intelligent context management – Safe eviction algorithm that preserves conversation logic, preventing token overflow while keeping your agent focused.
Rust native – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.

Resources

The best way to learn Ambi is to write an agent. The examples/ directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and multi‑tool parallel execution.

Installation

Add this to your Cargo.toml:

[dependencies]

ambi = "0.3"

For cloud‑only usage (faster compilation, no llama.cpp dependency):

ambi = { version = "0.3", default-features = false, features = ["openai-api"] }

Runtime Requirements

Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with rt-multi-thread enabled. Without this, Agent::make and all async methods will not function.

Bindings

Ambi also provides native bindings for other languages:

Python – Install the pre-built wheel from PyPI:

pip install ambi-python

from ambi import Agent, AgentState, Pipeline, LLMEngineConfig

Node.js – Install the npm package with prebuilt binaries:

npm install ambi-node

const { Engine, Agent, AgentState, ChatRunner } = require('ambi-node');

Prebuilt binaries are available for Windows, Linux (glibc & musl), and macOS on x64 & arm64 architectures. No Rust toolchain required on the consuming machine.

Quick start

use ambi::{Agent, AgentState, ChatRunner, LLMEngineConfig};
use std::sync::Arc;
use tokio::sync::RwLock;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Pick an engine configuration
    let config = LLMEngineConfig::OpenAI(ambi::OpenAIEngineConfig {
        api_key: std::env::var("OPENAI_API_KEY")?,
        base_url: "https://api.openai.com/v1".into(),
        model_name: "gpt-4o".into(),
        temp: 0.7,
        top_p: 0.95,
    });

    // 2. Build an agent
    let agent = Agent::make(config).await?
        .preamble("You are a helpful assistant.")
        .template(ambi::ChatTemplateType::Chatml);

    // 3. Create a shared state with a unique session ID
    let state = Arc::new(RwLock::new(AgentState::new("session-001")));

    // 4. Run the chat pipeline
    let runner = ChatRunner::default();
    let response = runner.chat(&agent, &state, "Hello, world!").await?;
    println!("{}", response);

    Ok(())
}

Using local inference

Enable the llama-cpp feature and optionally a GPU backend:

ambi = { version = "0.3", features = ["llama-cpp", "cuda"] }

Then swap the engine configuration:

let config = LLMEngineConfig::Llama(ambi::LlamaEngineConfig {
model_path: "./models/llama-3-8b.gguf".into(),
max_tokens:  4096,
buffer_size: 32,
use_gpu:     true,
n_gpu_layers: 100,
n_ctx:       8192,
n_tokens:    512,
n_seq_max:   1,
penalty_last_n:   64,
penalty_repeat:   1.1,
penalty_freq:     0.0,
penalty_present:  0.0,
temp:       0.7,
top_p:      0.9,
seed:       42,
min_keep:   1,
});

Adding custom tools

Define a tool by implementing the Tool trait. Ambi automatically generates the JSON Schema for you.

use ambi::{Tool, ToolDefinition, ToolErr};
use serde::{Deserialize, Serialize};
use async_trait::async_trait;

#[derive(Deserialize)]
struct WeatherArgs {
    city: String,
}

#[derive(Serialize)]
struct WeatherResult {
    temperature: f64,
    condition: String,
}

struct WeatherTool;

#[async_trait]
impl Tool for WeatherTool {
    const NAME: &'static str = "get_weather";

    type Args = WeatherArgs;
    type Output = WeatherResult;

    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get current weather for a city".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }),
            timeout_secs: Some(10),
            max_retries: Some(2),
            is_idempotent: true,
        }
    }

    async fn call(&self, args: WeatherArgs) -> Result<WeatherResult, ToolErr> {
        // Your implementation here
        Ok(WeatherResult {
            temperature: 22.5,
            condition: "Sunny".into(),
        })
    }
}

Attach the tool to your agent:

let agent = Agent::make(config).await?
.preamble("You are a weather assistant.")
.tool(WeatherTool) ?;

Now the agent can seamlessly invoke get_weather when the user asks about the weather. Ambi handles retries, timeouts, and parallel execution automatically.

Streaming responses

use futures::StreamExt;

let mut stream = runner.chat_stream(&agent, &state, "Tell me a story").await?;
while let Some(chunk) = stream.next().await {
    match chunk {
        Ok(text) => print!("{}", text),
        Err(e) => eprintln!("Stream error: {}", e),
    }
}

WASM targets (browser) support the same streaming API natively via fetch and ReadableStream – see examples/webAssembly for a live demo.

Context eviction & dynamic context

Ambi's context management automatically evicts old messages when the token budget is exceeded, while completely decoupling system instructions from the eviction FIFO queue for maximum KV Cache hit rates.

Dynamic context (RAG / session data)

Volatile background knowledge like RAG results or environment variables can be injected safely into AgentState without touching the static system_prompt:

// Inject RAG results for the current turn
state.write().await.set_dynamic_context("Relevant docs: ...");
// Or stack multiple sources
state.write().await.append_dynamic_context("Current time: 2025-01-01");

Use clear_dynamic_context() to reset between turns.

Eviction strategy

use ambi::config::EvictionStrategy;

let agent = Agent::make(config).await?
    .with_eviction_strategy(EvictionStrategy { max_safe_tokens: 4096 });

Eviction callback with state access

The callback now receives &AgentState, giving you safe access to identifiers and connection pools from state extensions for async database archiving:

let agent = Agent::make(config).await?
    .on_evict(|state: &AgentState, evicted: Vec<Arc<Message>>| {
        let session_id = &state.session_id;
        // Spawn an async task to archive evicted messages
        tokio::spawn(async move {
            // persist evicted messages to DB
        });
    });

ChatHistory helpers

// Find messages containing a keyword
let results = state.read().await.chat_history.search_by_keyword("weather");

// Get the last user message
if let Some(msg) = state.read().await.chat_history.last_user_message() {
    // inspect the user's latest intent
}

// Get the last assistant message
if let Some(msg) = state.read().await.chat_history.last_assistant_message() {
    // inspect the latest response
}

Custom tool‑call parser

By default Ambi uses [TOOL_CALL] ... [/TOOL_CALL] tags. You can bring your own parser:

use ambi::tool::{ToolCallParser, DefaultToolParser};
use ambi::types::StreamFormatter;

struct MyParser;

impl ToolCallParser for MyParser {
    fn format_instruction(&self, tools_json: &str) -> String {
        // instruct the model how to call tools
        format!("Use tools: {}", tools_json)
    }

    fn parse(&self, text: &str) -> Vec<(String, serde_json::Value)> {
        // extract tool calls from the model's output
        vec![]
    }

    fn create_stream_formatter(&self) -> Box<dyn StreamFormatter> {
        Box::new(ambi::agent::processor::PassThroughFormatter)
    }
}

let agent = Agent::make(config).await?
    .with_tool_parser(MyParser);

Error handling

Ambi uses thiserror to provide clear, actionable error types:

pub enum AmbiError {
    EngineError(String),
    AgentError(String),
    ToolError(String),
    ContextError(String),
    PipelineError(String),
    MaxIterationsReached(usize),
    Other(anyhow::Error),
}

All public APIs return Result<T, AmbiError>, making it easy to pattern‑match or propagate errors.

Testing

Ambi comes with comprehensive unit and integration tests. We recommend using cargo test during development. When testing agents, consider using a mock engine to avoid real API calls:

struct MockEngine;
#[async_trait]
impl LLMEngineTrait for MockEngine {
    async fn chat(&self, _: LLMRequest) -> Result<String> {
        Ok("Hello, I am a mock.".into())
    }
    // ...
}

let agent = Agent::make(LLMEngineConfig::Custom(Box::new(MockEngine))).await?;

Feature flags

Ambi uses Cargo features to keep compile times low:

openai-api (enabled by default) – OpenAI‑compatible cloud backend powered by async-openai.
llama-cpp – Local inference via llama.cpp (supports cuda, vulkan, metal, rocm sub‑features).
cuda, vulkan, metal, rocm – GPU acceleration for the local engine (choose exactly one).
macro – Enables #[tool] attribute macro for zero-boilerplate tool definitions with params(...) support.
mtmd – Multimodal (vision) support for local VLM models (implies llama-cpp).

ambi 0.3.8