Ambi
Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.
- Dual‑engine architecture – Seamlessly switch between local inference (via
llama.cppwith GPU acceleration) and cloud APIs (OpenAI‑compatible endpoints) without changing your agent code. - Advanced tool system – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema generation from Rust structs.
- Intelligent context management – Safe eviction algorithm that preserves conversation logic, preventing token overflow while keeping your agent focused.
- Rust native – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.
Resources
The best way to learn Ambi is to write an agent. The examples/
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.
Installation
Add this to your Cargo.toml:
[]
= "0.2"
For cloud‑only usage (faster compilation, no llama.cpp dependency):
= { = "0.2", = false, = ["openai-api"] }
Runtime Requirements
Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with rt-multi-thread enabled. Without this,
Agent::make and all async methods will not function.
Quick start
use ;
use ;
async
Using local inference
Enable the llama-cpp feature and optionally a GPU backend:
= { = "0.2", = ["llama-cpp", "cuda"] }
Then swap the engine configuration:
let config = Llama;
Adding custom tools
Define a tool by implementing the Tool trait. Ambi automatically generates the JSON Schema for you.
use ;
use ;
use async_trait;
;
Attach the tool to your agent:
let agent = make.await?
.preamble
.tool ?;
Now the agent can seamlessly invoke get_weather when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.
Streaming responses
use StreamExt;
let mut stream = runner.chat_stream.await?;
while let Some = stream.next.await
Context eviction
Ambi’s context management automatically evicts old messages when the token budget is exceeded, but you can fine‑tune the strategy:
let agent = make.await?
.with_eviction_strategy;
You can also register a callback to process evicted messages (e.g., to persist them):
let agent = make.await?
.on_evict;
Custom tool‑call parser
By default Ambi uses [TOOL_CALL] ... [/TOOL_CALL] tags. You can bring your own parser:
use ;
;
let agent = make.await?
.with_tool_parser;
Error handling
Ambi uses thiserror to provide clear, actionable error types:
All public APIs return Result<T, AmbiError>, making it easy to pattern‑match or propagate errors.
Testing
Ambi comes with comprehensive unit and integration tests. We recommend using cargo test during development. When
testing agents, consider using a mock engine to avoid real API calls:
;
let agent = with_custom_engine ?;
Feature flags
Ambi uses Cargo features to keep compile times low:
openai-api(enabled by default) – OpenAI‑compatible cloud backend powered byasync-openai.llama-cpp– Local inference viallama.cpp(supportscuda,vulkan,metal,rocmsub‑features).cuda,vulkan,metal,rocm– GPU acceleration for the local engine (choose exactly one).