tkach
A provider-independent agent runtime for Rust. Stateless agent loop, pluggable LLM providers, built-in file/shell tools, real SSE streaming, cooperative cancellation, and per-call approval gating.
Status: pre-1.0 (
0.3.0). Breaking changes are signalled viafeat!:conventional commits and recorded inCHANGELOG.md. The core API just stabilised across three milestones — foundation, streaming, approval — and is settling, but expect motion.
Why this exists
LLM agent runtimes tend to either (a) bake in a single provider and hide the loop, or (b) give you primitives without a working loop. This crate sits in the middle:
- Stateless
Agent::run— caller owns the message history; the agent returns the delta of new messages it appended. Resume, multi-turn chat, fork & retry all become composable. - Atomic event semantics under streaming —
ToolUseevents are emitted whole, never as partial JSON, regardless of how the upstream chunks them. - Sub-agents inherit the parent's executor — one
ApprovalHandler, oneToolPolicy, one tool registry gates the whole agent tree without explicit re-plumbing (Model 3). - Cooperative cancellation propagates — a single
CancellationTokenshuts down the loop, the SSE pull, the in-flight HTTP body, and anybashchild process viakill_on_drop.
Quick start
[]
= "0.3"
= { = "1", = ["macros", "rt-multi-thread"] }
use ;
async
Architecture at a glance
┌───────────┐ messages + cancel ┌─────────────────────────────┐
│ caller │──────────────────────▶│ Agent::run │
└───────────┘ new_messages, │ (or ::stream) │
text, usage, │ │
stop_reason └────┬───────────────────────┘
│
┌──────────────────┴────────────┐
▼ ▼
┌────────────┐ ┌───────────────────┐
│ Provider │ │ ToolExecutor │
│ │ │ ┌───────────────┐ │
│ Anthropic │ │ │ ToolPolicy │ │
│ OpenAI- │ │ ├───────────────┤ │
│ compatible │ │ │ApprovalHandler│ │
│ Mock │ │ ├───────────────┤ │
│ │ │ │ ToolRegistry │ │
└────────────┘ │ └───────────────┘ │
└─────────┬─────────┘
│
read-only batches in
parallel via join_all,
mutating sequentially
Built-in tools
| Tool | Class | What it does |
|---|---|---|
Read |
ReadOnly | Read file contents (numbered lines, offset/limit) |
Glob |
ReadOnly | Find files matching a glob (sorted by mtime) |
Grep |
ReadOnly | Regex search in files (with context, ignore patterns) |
WebFetch |
ReadOnly | HTTP GET a URL, returns body text |
Write |
Mutating | Write a file (creates parents) |
Edit |
Mutating | Replace exact string in a file |
Bash |
Mutating | Run shell command (cancel-aware via kill_on_drop) |
SubAgent |
Mutating | Spawn a nested agent that inherits the parent's tools |
tools::defaults() returns Read + Write + Edit + Glob + Grep + Bash. Add WebFetch and SubAgent::new(provider, model) explicitly when you want them.
Providers
use ;
// Anthropic
let p = from_env; // ANTHROPIC_API_KEY
// OpenAI itself
let p = from_env; // OPENAI_API_KEY
// Any OpenAI-compatible endpoint:
// OpenRouter
let p = new
.with_base_url;
// Local Ollama
let p = new
.with_base_url;
// Moonshot, DeepSeek, Together, Groq — same shape
Implementing your own provider: implement LlmProvider (one complete and one stream method).
Anthropic prompt caching
SystemBlock::cached, Content::text_cached, and AgentBuilder::cache_tools mark cache breakpoints; Usage reports cache_creation_input_tokens / cache_read_input_tokens so callers can measure hit rate. Default TTL 5min, 1h via CacheControl::ephemeral_1h(). Cache reads bill at 0.1x base input; writes at 1.25x (5m) / 2x (1h). See examples/anthropic_caching.rs and examples/anthropic_caching_streaming.rs.
Anthropic Message Batches (50 % async)
Anthropic's Message Batches API takes the same Request body, runs it asynchronously over up to 24h, and bills 50 % off input + output tokens. Stack with SystemBlock::cached_1h(...) for ≈85 % off when prefixes are stable across batches. Right call for overnight backfills, scheduled recompute jobs, evals, or any workload that doesn't care about p99.
use StreamExt;
use Anthropic;
use ;
use ;
let provider = from_env;
let requests = vec!;
let handle = provider.create_batch.await?; // status=InProgress
loop
let mut stream = provider.batch_results.await?; // JSONL line-by-line
while let Some = stream.next.await
custom_ids are validated client-side (regex + dedup) before the HTTP call. Caller owns the polling cadence — there's no await_batch helper because the right interval (every 5min vs every 1h vs exp-backoff) is workload-dependent. See examples/anthropic_batch.rs, examples/anthropic_batch_cancel.rs, examples/anthropic_batch_mixed.rs.
Streaming
use ;
use StreamExt;
let mut stream = agent.stream;
while let Some = stream.next.await
let result = stream.into_result.await?; // final AgentResult
Backpressure is real: a slow consumer parks the producer task, which closes the SSE read side, which lets the OS shrink the TCP receive window — all the way back to the LLM server. Cancellation works mid-stream too: cancel.cancel() aborts the current SSE pull within milliseconds via tokio::select!.
See examples/streaming_cancel.rs for live cancel timing.
Approval flow
use ;
use async_trait;
use Value;
;
let agent = builder
.provider
.model
.tools
.approval
.build;
Deny(reason) flows back to the model as is_error: true tool_result so the LLM can adapt — it is not an AgentError. The runtime races approve() against cancel.cancelled(), so an outer cancel always wins over a hung UI handler.
Custom tools
use ;
use ;
;
let agent = builder
.provider
.tool
.build;
Long-running tools should tokio::select! on ctx.cancel.cancelled() and return ToolError::Cancelled promptly — the loop trusts the contract and does not race tools at the outer level.
Examples
Each runnable demo also asserts its invariants — cargo run --example NAME either prints the demo and exits 0, or panics with a clear message.
| Example | What it shows |
|---|---|
basic.rs |
Minimal agent.run |
streaming.rs |
Live token streaming |
streaming_multi_tool.rs |
Multi-turn write→edit→read chain via Agent::stream |
streaming_subagent.rs |
Sonnet streams, delegates to a Haiku sub-agent |
streaming_openai_tools.rs |
OpenAI-compatible tool call (works through OpenRouter) |
streaming_cancel.rs |
Cancel mid-generation, partial text preserved |
streaming_resilience.rs |
Tool failure + cancel-during-tool + multi-block turns |
approval_flow.rs |
Live denial flow with custom ApprovalHandler |
parallel_tools.rs |
Read-only tools running in parallel |
custom_tool.rs |
Defining your own tool |
anthropic_batch.rs |
Batch API happy path: submit → poll → stream results (50% off, 24h async) |
anthropic_batch_cancel.rs |
Batch cancel-then-fetch-partial — mix of Succeeded and Canceled outcomes |
anthropic_batch_mixed.rs |
Per-row error isolation — bad request rides alongside successes as Errored |
Examples that talk to live APIs read ANTHROPIC_API_KEY (and optionally OPENAI_API_KEY + OPENAI_BASE_URL + OPENAI_SMOKE_MODEL) from .env — see .env.example.
Testing
CI runs fmt, clippy (with cognitive-complexity gates), MSRV (1.86), and cargo deny on every PR. Real-API smoke runs are gated behind Actions → Integration Tests → Run workflow → tier=smoke|full.
Versioning & releases
Conventional commits + release-please drive the version bump and changelog. See RELEASING.md. feat!: commits cut a breaking-change release; pre-1.0 those bump the minor version.
License
MIT.