harness-rs-tools-shell 0.0.9

Allowlisted read-only shell tool for harness-rs. Per-program safe-args predicate.
Documentation
# harness

> **Agent = Model + Harness.** This crate is the *Harness* — the modular
> scaffolding around an LLM that turns it into an autonomous coding agent.

Rust framework for building production coding agents, based on the
*harness engineering* discipline as written up by Böckeler (Thoughtworks,
2026) and Lopopolo (OpenAI, 2026). See **`DESIGN.md`** for the full
architectural rationale.

## What you get

| Layer | What it does | Crate |
|------|-----|-----|
| **Model** | OpenAI-compatible + Anthropic-native + scriptable mock | `harness-models` |
| **Tools** | fs (read/write/edit/list), shell (risk-classified allowlist), web (DDG → Bing search + URL fetch + HTML→text) | `harness-tools-fs`, `harness-tools-shell`, `harness-tools-web` |
| **Sensors** | `cargo check` + `cargo clippy` produce LLM-friendly `Signal`s; auto-fix patches apply automatically | `harness-sensors-rust` |
| **Skills** | strict [agentskills.io]https://agentskills.io/specification validator + `#[skill]` proc-macro + export to spec-compliant directory | `harness-skills`, `harness-macros` |
| **Guides** | feedforward Markdown context, scoped by task | `#[guide]` + `harness-templates` |
| **Hooks** | 27-event lifecycle bus with deny/inject/mutate | `harness-hooks` + `#[hook]` |
| **Compactor** | 5-stage progressive compaction (auto-triggered by budget) | `harness-compactor` |
| **Loop** | ReAct + tool-call dispatch + sensor feedback + auto-fix + forced final-synthesis on budget exhaustion | `harness-loop` |
| **Memory** | Open `Memory` trait + JSONL `FileMemory` + `MemoryGuide` (recall) + `MemorySynthesizer` (cheap-model distillation into atomic facts) | `harness-core`, `harness-context`, `harness-loop` |
| **Recall** | Cross-session conversation search — `RecallStore` trait + JSONL `FileRecall` (default) + FTS5/CJK-trigram `SqliteRecall`; one-call `.with_recall(store)` adds capture + a `session_search` tool, owner-scoped | `harness-core`, `harness-context`, `harness-recall-sqlite` |
| **Learning loop** | Self-evolving skills + memory — `.with_learning_loop(cfg)` forks a review subagent at session end that writes/patches skills (`skill_manage`) and memory from the transcript | `harness-loop`, `harness-tools-skills` |
| **Scheduler** | In-process scheduled agent jobs with delivery — `JobStore` + `Channel` (stdout / email-Resend) + a `Scheduler` that runs jobs as agent turns + a `cronjob` tool for self-scheduling | `harness-scheduler` |
| **Observability** | `SessionRecorder` JSONL traces, `LiveProgressHook` live stderr stream, `harness trace --verbose` | `harness-loop` + `harness-cli` |
| **Blueprint** | deterministic + agent state machine with retry/fallback | `harness-blueprint` |
| **Sandbox** | git worktree isolation (container/VM in v0.2) | `harness-sandbox` |
| **CLI** | `harness skills validate / list / export`, `harness new`, `harness trace` | `harness-cli` |

## 60-second tour

### 1. Build a minimal agent

```rust
use harness::prelude::*;
use harness_loop::AgentLoop;
use harness_models::{OpenAiCompat, providers::DEEPSEEK};
use harness_tools_fs::{ListDir, ReadFile};
use harness_context::default_world;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), harness::HarnessError> {
    let model = OpenAiCompat::with_key(
        DEEPSEEK,
        "deepseek-chat",
        std::env::var("DEEPSEEK_API_KEY").unwrap(),
    );
    let mut world = default_world(".");
    let outcome = AgentLoop::new(model)
        .with_tool(Arc::new(ReadFile))
        .with_tool(Arc::new(ListDir))
        .run(
            Task { description: "Find Cargo.toml and tell me the workspace name".into(),
                   source: None, deadline: None },
            &mut world,
        )
        .await?;
    println!("{outcome:?}");
    Ok(())
}
```

### 2. Register a skill, tool, guide, sensor, or hook with a proc-macro

```rust
/// Greet the user politely. Use when the user explicitly asks for a friendly hello.
#[harness::skill(name = "polite-hello", harness(kind = "inferential", risk = "read-only"))]
async fn polite_hello(_ctx: &mut Context, _w: &mut World) -> Result<(), harness::SkillError> {
    Ok(())
}

#[harness::tool(name = "reverse", risk = "read-only",
    schema = r#"{"type":"object","properties":{"text":{"type":"string"}}}"#)]
async fn reverse(args: serde_json::Value, _w: &mut World)
    -> Result<ToolResult, ToolError> { /* ... */ }

#[harness::guide(scope = "always", kind = "inferential")]
async fn project_intro(ctx: &mut Context, _w: &World) -> Result<(), GuideError> {
    ctx.guides.push(Block::Text("Always reply in two sentences.".into()));
    Ok(())
}

#[harness::sensor(stage = "self-correct", kind = "computational")]
async fn no_unwrap(action: &Action, w: &World) -> Result<Vec<Signal>, SensorError> { /* ... */ }

#[harness::hook(event = "PreToolUse", name = "audit")]
fn audit(ev: &Event<'_>, _w: &mut World) -> HookOutcome {
    tracing::info!(?ev); HookOutcome::Allow
}
```

All five auto-register via `inventory`; `AgentLoop::with_macro_hooks()` and
`SkillRegistry::with_macro_skills()` pick them up at runtime.

### 3. Hybrid deterministic + agent state machine

```rust
use harness_blueprint::{Blueprint, Node, NodeOutput, Transition};

let bp = Blueprint::new()
    .add("fmt",    Node::deterministic(|w| Box::pin(async move {
        w.runner.exec("cargo", &["fmt", "--all"], Some(w.repo.root.as_path())).await?;
        Ok(NodeOutput { transition: Transition::Next, data: Default::default() })
    })))
    .add("work",   Node::agent(|w| Box::pin(async move { /* run AgentLoop */ })))
    .add("test",   Node::deterministic(|w| Box::pin(async move { /* cargo test */ })))
    .edge("fmt", "work").edge("work", "test")
    .branch_on_failure("test", "work", 2);
```

### 4. Validate / export skills for any spec-compliant agent

```bash
$ harness skills validate ./skills/format-rust
✓ valid: format-rust — Run cargo fmt across the workspace.

$ harness skills export ./out --from ./skills
✓ ./out/format-rust/SKILL.md
✓ ./out/review-axum/SKILL.md
exported 2 skill(s) to ./out
```

The exported directory is consumable by Claude Code, Cursor, Codex, or any
agent that follows the agentskills.io spec.

### 5. Scaffold a new agent project

```bash
$ harness new my-agent
✓ created my-agent/
  └─ Cargo.toml
  └─ src/main.rs   # minimal agent with one tool and one skill
```

### 6. Open long-term memory (your harness, your memory)

Persist durable facts across sessions, recall them automatically on the
next run, all on the user's disk — no provider-side state.

```rust
use harness_context::FileMemory;
use harness_loop::{MemoryGuide, MemorySynthesizer};
use std::sync::Arc;

let mem: Arc<dyn harness::Memory> =
    Arc::new(FileMemory::open("~/.my-agent/memory.jsonl")?);

// Cheap "synth" model distils each session into 1-3 atomic facts.
let synth_model: Arc<dyn harness::Model> = Arc::new(OpenAiCompat::with_key(
    DEEPSEEK, "deepseek-v4-flash", key.clone(),
));
let synth = Arc::new(
    MemorySynthesizer::new(mem.clone(), synth_model)
        .with_source("my-agent")
        .with_max_facts(3),
);

let loop_ = AgentLoop::new(model)
    .with_guide(Arc::new(MemoryGuide::new(mem.clone()).with_top_k(5)))
    .with_hook(synth.clone() as Arc<dyn harness::Hook>);

// ... run sessions ...

synth.flush_pending().await; // before main() exits
```

What you get:
- `MemoryGuide` calls `recall()` with the current task description on every
  session start and injects the top-K hits into the model's system prompt.
- `MemorySynthesizer` asks the cheap synth model to extract durable facts
  from each completed session, parses them as JSON, persists each as an
  independent `MemoryEntry`. Markdown fences tolerated; unparseable output
  falls back to a `"synth-raw"` entry rather than silent drop.
- File format is plain JSONL — `cat`, `grep`, version-controllable,
  transferable. Swap to a vector store by implementing the `Memory` trait;
  nothing else in the framework needs to change.

The `examples/personal-assistant` and `examples/investor-bot` binaries
expose this via `--memory <path>` + `--synth-model <id>` flags.

### 7. Cross-session recall (search your own past conversations)

One builder call captures every turn and gives the agent a `session_search`
tool. Owner-scoped, so a multi-tenant app can't cross users.

```rust
use harness_context::FileRecall;           // JSONL default, zero new deps
// or: use harness_recall_sqlite::SqliteRecall;  // FTS5 + CJK trigram, for scale
use harness_core::RecallStore;
use serde_json::json;
use std::sync::Arc;

let store: Arc<dyn RecallStore> = Arc::new(FileRecall::open("~/.my-agent/recall")?);

let loop_ = AgentLoop::new(model).with_recall(store); // capture + session_search tool
// .auto_inject() also surfaces top-k relevant past context at session start.

// Multi-tenant: set owner/session per request so users can't read each other.
world.profile.extra.insert("recall_owner".into(), json!(user_id));
world.profile.extra.insert("recall_session".into(), json!(conversation_id));
```

### 8. Self-evolving learning loop (skills + memory that grow)

After a session does real work, a background review subagent reads the
transcript and writes/patches **skills** and **memory** — so the next
session starts smarter.

```rust
use harness_loop::LearningConfig;
use harness_tools_skills::SkillManageTool;   // skill_manage: create/edit/patch/delete SKILL.md
use harness_tools_memory::RememberThisTool;
use std::sync::Arc;

let review_model: Arc<dyn harness::Model> = Arc::new(OpenAiCompat::with_key(
    DEEPSEEK, "deepseek-v4-flash", key.clone(),
));

let loop_ = AgentLoop::new(model).with_learning_loop(
    LearningConfig::new(review_model)
        .with_tool(Arc::new(SkillManageTool::new("~/.my-agent/skills")))
        .with_tool(Arc::new(RememberThisTool::new(memory.clone())))
        .with_nudge_interval(10), // review after >= 10 tool calls
);
```

The review fires at session end, is white-listed to only the tools you
inject, and is best-effort — a review failure never affects the finished run.

### 9. Scheduled agent jobs with delivery

`harness-scheduler` runs jobs as agent turns on a schedule and delivers the
output to a channel; the agent can also schedule jobs for itself.

```rust
use harness_scheduler::{
    Scheduler, FileJobStore, StdoutChannel, EmailChannel, CronjobTool, JobStore,
};
use std::sync::Arc;

let jobs: Arc<dyn JobStore> = Arc::new(FileJobStore::open("~/.my-agent/jobs.json")?);
let model: Arc<dyn harness::Model> = Arc::new(OpenAiCompat::with_key(
    DEEPSEEK, "deepseek-v4-flash", key.clone(),
));

Scheduler::new(jobs.clone(), model)
    .with_channel(Arc::new(StdoutChannel::new()))
    .with_channel(Arc::new(EmailChannel::from_env().unwrap())) // RESEND_API_KEY + DIGEST_FROM
    .with_tool(Arc::new(CronjobTool::new(jobs.clone())))       // let the agent self-schedule
    .spawn(); // ticks every minute; runs due jobs, delivers their output
```

A `Job` is `{ schedule, prompt, channel }`; schedules are `"daily 08:00"`,
`"weekly mon 09:30"`, or `"every 15m"`. A job whose output is `[SILENT]`
suppresses delivery.

### 10. Observe what the agent is doing

```bash
# Live stderr stream of every model call, tool call, and tool result:
HARNESS_PROGRESS=1 cargo run -p investor-bot -- "..."

# Or with a recorded session log + post-mortem inspection:
cargo run -p investor-bot -- --record /tmp/run.jsonl "..."
harness trace /tmp/run.jsonl --verbose
```

## Testing & verification

```
$ cargo test --workspace
... 238 tests passing
```

Three layers of verification:

1. **Unit tests** (per crate) — pure logic, no I/O.
2. **AgentLoop integration tests** (`harness-loop/tests/agent_loop.rs`) —
   `MockModel` drives the full pipeline with scripted responses; zero
   network, deterministic.
3. **Golden-path test** (`harness-loop/tests/golden_path.rs`) — every
   component (guide, tool, sensor, auto-fix, hook, compactor) exercised at
   once against a tmp workspace, final on-disk state asserted.
4. **Live demo** (`examples/crate-keeper`) — runs against DeepSeek
   (`flash` or `pro` tier) for wire-format validation that mocks can't
   catch.

## Examples

See [`examples/README.md`](examples/README.md) for full descriptions. In order
of increasing surface area:

- `examples/deepseek-hello` — smallest possible Hello-world against DeepSeek.
- `examples/crate-keeper``MockModel` smoke test; no network.
- `examples/personal-assistant` — scheduling agent with `UserProfile`,
  REPL, brief mode.
- `examples/investor-bot` — autonomous web research with multi-engine search
  fallback + retry.
- `examples/deepseek-caps-e2e` — real-DeepSeek end-to-end of the v0.0.5
  capabilities (recall + learning loop + scheduler). Reads `DEEPSEEK_API_KEY`.

## Status

- **v0.0.1** — initial publish (15 of 18 crates).
- **v0.0.2**`UserProfile` + `ProfileGuide`, optional `harness-rs-daemon`
  scheduler, retry/backoff in model adapters, MCP server with resources +
  prompts, session record/replay, multi-engine search, `#[non_exhaustive]`
  sweep, security gates on `FixPatch::RunCommand` + `shell_read`. (Shipped
  in stages; superseded by 0.0.3.)
- **v0.0.3** — Re-publish of the 0.0.2 feature set as a single consistent
  snapshot. No new features.
- **v0.0.4** — Observability (`LiveProgressHook`,
  `harness trace --verbose`), forced final-synthesis on budget exhaustion,
  and **open long-term memory** (`Memory` trait, `FileMemory` JSONL,
  `MemoryGuide`, `MemorySynthesizer` cheap-model distillation). Examples
  ship `--memory` / `--synth-model` / `--progress` / `--record` /
  `HARNESS_*` env vars.
- **v0.0.5** — ✅ **current**. Three new opt-in, one-builder-call capabilities:
  **cross-session recall** (`RecallStore` + `FileRecall` + `session_search`;
  optional FTS5 `harness-rs-recall-sqlite`), a **self-evolving learning loop**
  (`.with_learning_loop()` forks a review subagent to write/patch skills +
  memory; new `harness-rs-tools-skills` `skill_manage`), and **in-process
  scheduling + delivery** (new `harness-rs-scheduler`: `JobStore`, `Channel`,
  `cronjob`). Plus `harness_core::DynModel` (use a boxed `Arc<dyn Model>` as a
  concrete `M`). Verified with a real-DeepSeek end-to-end
  (`examples/deepseek-caps-e2e`). See [CHANGELOG]CHANGELOG.md.
- **v0.1+**`ContainerSandbox` / `VmSandbox` / first-class blueprint
  `Node::Agent` / semantic memory backends are on the road.

## License

Dual-licensed under MIT OR Apache-2.0.