harness
Agent = Model + Harness. This crate is the Harness — the modular scaffolding around an LLM that turns it into an autonomous coding agent.
Rust framework for building production coding agents, based on the
harness engineering discipline as written up by Böckeler (Thoughtworks,
2026) and Lopopolo (OpenAI, 2026). See DESIGN.md for the full
architectural rationale.
What you get
| Layer | What it does | Crate |
|---|---|---|
| Model | OpenAI-compatible + Anthropic-native + scriptable mock | harness-models |
| Tools | fs (read/write/edit/list), shell (risk-classified allowlist), web (DDG → Bing search + URL fetch + HTML→text) | harness-tools-fs, harness-tools-shell, harness-tools-web |
| Sensors | cargo check + cargo clippy produce LLM-friendly Signals; auto-fix patches apply automatically |
harness-sensors-rust |
| Skills | strict agentskills.io validator + #[skill] proc-macro + export to spec-compliant directory |
harness-skills, harness-macros |
| Guides | feedforward Markdown context, scoped by task | #[guide] + harness-templates |
| Hooks | 27-event lifecycle bus with deny/inject/mutate | harness-hooks + #[hook] |
| Compactor | 5-stage progressive compaction (auto-triggered by budget) | harness-compactor |
| Loop | ReAct + tool-call dispatch + sensor feedback + auto-fix + forced final-synthesis on budget exhaustion | harness-loop |
| Memory | Open Memory trait + JSONL FileMemory + MemoryGuide (recall) + MemorySynthesizer (cheap-model distillation into atomic facts) |
harness-core, harness-context, harness-loop |
| Recall | Cross-session conversation search — RecallStore trait + JSONL FileRecall (default) + FTS5/CJK-trigram SqliteRecall; one-call .with_recall(store) adds capture + a session_search tool, owner-scoped |
harness-core, harness-context, harness-recall-sqlite |
| Learning loop | Self-evolving skills + memory — .with_learning_loop(cfg) forks a review subagent at session end that writes/patches skills (skill_manage) and memory from the transcript |
harness-loop, harness-tools-skills |
| Scheduler | In-process scheduled agent jobs with delivery — JobStore + Channel (stdout / email-Resend) + a Scheduler that runs jobs as agent turns + a cronjob tool for self-scheduling |
harness-scheduler |
| Observability | SessionRecorder JSONL traces, LiveProgressHook live stderr stream, harness trace --verbose |
harness-loop + harness-cli |
| Blueprint | deterministic + agent state machine with retry/fallback | harness-blueprint |
| Sandbox | git worktree isolation (container/VM in v0.2) | harness-sandbox |
| CLI | harness skills validate / list / export, harness new, harness trace |
harness-cli |
60-second tour
1. Build a minimal agent
use *;
use AgentLoop;
use ;
use ;
use default_world;
use Arc;
async
2. Register a skill, tool, guide, sensor, or hook with a proc-macro
/// Greet the user politely. Use when the user explicitly asks for a friendly hello.
async
async
async
async
All five auto-register via inventory; AgentLoop::with_macro_hooks() and
SkillRegistry::with_macro_skills() pick them up at runtime.
3. Hybrid deterministic + agent state machine
use ;
let bp = new
.add
.add
.add
.edge.edge
.branch_on_failure;
4. Validate / export skills for any spec-compliant agent
)
The exported directory is consumable by Claude Code, Cursor, Codex, or any agent that follows the agentskills.io spec.
5. Scaffold a new agent project
6. Open long-term memory (your harness, your memory)
Persist durable facts across sessions, recall them automatically on the next run, all on the user's disk — no provider-side state.
use FileMemory;
use ;
use Arc;
let mem: =
new;
// Cheap "synth" model distils each session into 1-3 atomic facts.
let synth_model: = new;
let synth = new;
let loop_ = new
.with_guide
.with_hook;
// ... run sessions ...
synth.flush_pending.await; // before main() exits
What you get:
MemoryGuidecallsrecall()with the current task description on every session start and injects the top-K hits into the model's system prompt.MemorySynthesizerasks the cheap synth model to extract durable facts from each completed session, parses them as JSON, persists each as an independentMemoryEntry. Markdown fences tolerated; unparseable output falls back to a"synth-raw"entry rather than silent drop.- File format is plain JSONL —
cat,grep, version-controllable, transferable. Swap to a vector store by implementing theMemorytrait; nothing else in the framework needs to change.
The examples/personal-assistant and examples/investor-bot binaries
expose this via --memory <path> + --synth-model <id> flags.
7. Cross-session recall (search your own past conversations)
One builder call captures every turn and gives the agent a session_search
tool. Owner-scoped, so a multi-tenant app can't cross users.
use FileRecall; // JSONL default, zero new deps
// or: use harness_recall_sqlite::SqliteRecall; // FTS5 + CJK trigram, for scale
use RecallStore;
use json;
use Arc;
let store: = new;
let loop_ = new.with_recall; // capture + session_search tool
// .auto_inject() also surfaces top-k relevant past context at session start.
// Multi-tenant: set owner/session per request so users can't read each other.
world.profile.extra.insert;
world.profile.extra.insert;
8. Self-evolving learning loop (skills + memory that grow)
After a session does real work, a background review subagent reads the transcript and writes/patches skills and memory — so the next session starts smarter.
use LearningConfig;
use SkillManageTool; // skill_manage: create/edit/patch/delete SKILL.md
use RememberThisTool;
use Arc;
let review_model: = new;
let loop_ = new.with_learning_loop;
The review fires at session end, is white-listed to only the tools you inject, and is best-effort — a review failure never affects the finished run.
9. Scheduled agent jobs with delivery
harness-scheduler runs jobs as agent turns on a schedule and delivers the
output to a channel; the agent can also schedule jobs for itself.
use ;
use Arc;
let jobs: = new;
let model: = new;
new
.with_channel
.with_channel // RESEND_API_KEY + DIGEST_FROM
.with_tool // let the agent self-schedule
.spawn; // ticks every minute; runs due jobs, delivers their output
A Job is { schedule, prompt, channel }; schedules are "daily 08:00",
"weekly mon 09:30", or "every 15m". A job whose output is [SILENT]
suppresses delivery.
10. Observe what the agent is doing
# Live stderr stream of every model call, tool call, and tool result:
HARNESS_PROGRESS=1
# Or with a recorded session log + post-mortem inspection:
Testing & verification
$ cargo test --workspace
... 238 tests passing
Three layers of verification:
- Unit tests (per crate) — pure logic, no I/O.
- AgentLoop integration tests (
harness-loop/tests/agent_loop.rs) —MockModeldrives the full pipeline with scripted responses; zero network, deterministic. - Golden-path test (
harness-loop/tests/golden_path.rs) — every component (guide, tool, sensor, auto-fix, hook, compactor) exercised at once against a tmp workspace, final on-disk state asserted. - Live demo (
examples/crate-keeper) — runs against DeepSeek (flashorprotier) for wire-format validation that mocks can't catch.
Examples
See examples/README.md for full descriptions. In order
of increasing surface area:
examples/deepseek-hello— smallest possible Hello-world against DeepSeek.examples/crate-keeper—MockModelsmoke test; no network.examples/personal-assistant— scheduling agent withUserProfile, REPL, brief mode.examples/investor-bot— autonomous web research with multi-engine search fallback + retry.examples/deepseek-caps-e2e— real-DeepSeek end-to-end of the v0.0.5 capabilities (recall + learning loop + scheduler). ReadsDEEPSEEK_API_KEY.
Status
- v0.0.1 — initial publish (15 of 18 crates).
- v0.0.2 —
UserProfile+ProfileGuide, optionalharness-rs-daemonscheduler, retry/backoff in model adapters, MCP server with resources + prompts, session record/replay, multi-engine search,#[non_exhaustive]sweep, security gates onFixPatch::RunCommand+shell_read. (Shipped in stages; superseded by 0.0.3.) - v0.0.3 — Re-publish of the 0.0.2 feature set as a single consistent snapshot. No new features.
- v0.0.4 — Observability (
LiveProgressHook,harness trace --verbose), forced final-synthesis on budget exhaustion, and open long-term memory (Memorytrait,FileMemoryJSONL,MemoryGuide,MemorySynthesizercheap-model distillation). Examples ship--memory/--synth-model/--progress/--record/HARNESS_*env vars. - v0.0.5 — ✅ current. Three new opt-in, one-builder-call capabilities:
cross-session recall (
RecallStore+FileRecall+session_search; optional FTS5harness-rs-recall-sqlite), a self-evolving learning loop (.with_learning_loop()forks a review subagent to write/patch skills + memory; newharness-rs-tools-skillsskill_manage), and in-process scheduling + delivery (newharness-rs-scheduler:JobStore,Channel,cronjob). Plusharness_core::DynModel(use a boxedArc<dyn Model>as a concreteM). Verified with a real-DeepSeek end-to-end (examples/deepseek-caps-e2e). See CHANGELOG. - v0.1+ —
ContainerSandbox/VmSandbox/ first-class blueprintNode::Agent/ semantic memory backends are on the road.
License
Dual-licensed under MIT OR Apache-2.0.