Ambi
Ambi is a flexible, highly customizable AI Agent framework built entirely in Rust. It empowers you to create production‑grade agents with minimal boilerplate, trait‑first design, and zero‑cost abstractions.
- Dual‑engine architecture – Seamlessly switch between local inference (via
llama.cppwith GPU acceleration) and cloud APIs (OpenAI‑compatible endpoints) without changing your agent code. - Advanced tool system – Parallel multi‑tool execution, per‑tool timeouts and retries, automatic JSON Schema generation from Rust structs.
- Intelligent context management – Safe eviction algorithm that preserves conversation logic, preventing token overflow while keeping your agent focused.
- Rust native – Memory safety, async/await everywhere, minimal dependencies, and fast compilation times.
Resources
The best way to learn Ambi is to write an agent. The examples/
directory contains complete, runnable examples covering basic chat, custom tools, local GPU inference, streaming, and
multi‑tool parallel execution.
Installation
Add this to your Cargo.toml:
[]
= "0.3"
For cloud‑only usage (faster compilation, no llama.cpp dependency):
= { = "0.3", = false, = ["openai-api"] }
Runtime Requirements
Ambi is built on the Tokio async runtime. Ensure your project uses Tokio with rt-multi-thread enabled. Without this,
Agent::make and all async methods will not function.
Bindings
Ambi also provides native bindings for other languages:
Python – Install the pre-built wheel from PyPI:
Node.js – Install the npm package with prebuilt binaries:
const = require;
Prebuilt binaries are available for Windows, Linux (glibc & musl), and macOS on x64 & arm64 architectures. No Rust toolchain required on the consuming machine.
Quick start
use ;
use Arc;
use RwLock;
async
Using local inference
Enable the llama-cpp feature and optionally a GPU backend:
= { = "0.3", = ["llama-cpp", "cuda"] }
Then swap the engine configuration:
let config = Llama;
Adding custom tools
Define a tool by implementing the Tool trait. Ambi automatically generates the JSON Schema for you.
use ;
use ;
use async_trait;
;
Attach the tool to your agent:
let agent = make.await?
.preamble
.tool ?;
Now the agent can seamlessly invoke get_weather when the user asks about the weather. Ambi handles retries, timeouts,
and parallel execution automatically.
Streaming responses
use StreamExt;
let mut stream = runner.chat_stream.await?;
while let Some = stream.next.await
WASM targets (browser) support the same streaming API natively via fetch and ReadableStream – see
examples/webAssembly for a live demo.
Context eviction & dynamic context
Ambi's context management automatically evicts old messages when the token budget is exceeded, while completely decoupling system instructions from the eviction FIFO queue for maximum KV Cache hit rates.
Dynamic context (RAG / session data)
Volatile background knowledge like RAG results or environment variables can be injected safely into AgentState
without touching the static system_prompt:
// Inject RAG results for the current turn
state.write.await.set_dynamic_context;
// Or stack multiple sources
state.write.await.append_dynamic_context;
Use clear_dynamic_context() to reset between turns.
Eviction strategy
use EvictionStrategy;
let agent = make.await?
.with_eviction_strategy;
Eviction callback with state access
The callback now receives &AgentState, giving you safe access to identifiers and connection pools from state
extensions for async database archiving:
let agent = make.await?
.on_evict;
ChatHistory helpers
// Find messages containing a keyword
let results = state.read.await.chat_history.search_by_keyword;
// Get the last user message
if let Some = state.read.await.chat_history.last_user_message
// Get the last assistant message
if let Some = state.read.await.chat_history.last_assistant_message
Custom tool‑call parser
By default Ambi uses [TOOL_CALL] ... [/TOOL_CALL] tags. You can bring your own parser:
use ;
use StreamFormatter;
;
let agent = make.await?
.with_tool_parser;
Error handling
Ambi uses thiserror to provide clear, actionable error types:
All public APIs return Result<T, AmbiError>, making it easy to pattern‑match or propagate errors.
Testing
Ambi comes with comprehensive unit and integration tests. We recommend using cargo test during development. When
testing agents, consider using a mock engine to avoid real API calls:
;
let agent = make.await?;
Feature flags
Ambi uses Cargo features to keep compile times low:
openai-api(enabled by default) – OpenAI‑compatible cloud backend powered byasync-openai.llama-cpp– Local inference viallama.cpp(supportscuda,vulkan,metal,rocmsub‑features).cuda,vulkan,metal,rocm– GPU acceleration for the local engine (choose exactly one).macro– Enables#[tool]attribute macro for zero-boilerplate tool definitions withparams(...)support.mtmd– Multimodal (vision) support for local VLM models (impliesllama-cpp).