# autorize — agent-targeted reference
This document is printed by `autorize llms`. It is meant for LLM/agent
consumers landing in a repository that uses `autorize`. Everything an agent
needs to drive `autorize init` → edit config → `autorize run` →
`autorize status` / `autorize resume` is here. No
[source code](https://github.com/wbbradley/autorize) reading required.
## 1. What `autorize` is
`autorize` is a generic iterative-improvement harness. For each iteration it
creates a fresh **git worktree** off the `autorize/<name>` tracking branch,
runs your agent CLI inside the worktree with a hard wall-clock budget, then
runs a scoring command. If the score improves, the worktree's diff is
committed onto the tracking branch; otherwise it is discarded. The loop
stops when a total deadline fires, `max_iterations` is hit, or a configurable
number of consecutive no-op iterations is reached. State is checkpointed
atomically so the loop can be killed and resumed at any point.
## 2. Subcommands and workflow
| `autorize init <name>` | Scaffold `.autorize/<name>/{config.toml, program.md}`. |
| `autorize run <name>` | Run the loop until deadline / `max_iterations` / `max_consecutive_noops`. |
| `autorize status <name>` | Print a one-shot summary from `state.json` + `iterations.jsonl`. |
| `autorize resume <name>` | Recover after a crash; any in-progress iter is recorded as `killed` and the loop continues. |
| `autorize llms` | Print this document. |
End-to-end workflow:
1. `autorize init <name>` — scaffolds `.autorize/<name>/config.toml` and
`.autorize/<name>/program.md`.
2. Edit `.autorize/<name>/config.toml` (point `objective.command` at a
scoring script, point `agent.command` at an agent CLI, set a schedule).
3. Edit `.autorize/<name>/program.md` (freeform agent instructions; included
verbatim at the top of every prompt).
4. Commit the repo — `autorize run` refuses a dirty tree by default
(use `--allow-dirty` to override; the `.autorize/` directory is always
ignored for the dirty-tree check).
5. `autorize run <name>` — drives the loop.
6. `autorize status <name>` — one-shot summary from another shell.
7. `autorize resume <name>` — recover after a crash or `Ctrl-C` mid-iter.
## 3. Iteration state machine and outcomes
Each iteration runs through these steps in order, checkpointing
`state.json` between each:
```
Idle
-> AllocateIter mkdir iter-NNNN/
-> CreateWorktree git worktree add ... autorize/<name>
-> RunSetup setup.command (skipped if empty)
-> BuildPrompt render -> iter-NNNN/prompt.md
-> InvokeAgent spawn agent.command with hard wall-clock budget
(SIGTERM the whole process group, 5 s grace, SIGKILL)
-> CaptureDiff git stage-all + diff against autorize/<name>
empty diff -> noop; touches deny_paths -> denied
-> RunTeardown teardown.command (skipped if empty)
-> Score run objective.command, parse to Option<f64>;
on failure: apply objective.fail_mode
-> Decide improved compared to best so far?
-> Merge commit on autorize/<name>, advance tracking branch
-> Discard (used when the score does not improve)
-> Cleanup remove worktree (unless iteration.keep_worktrees)
-> Record append IterationRecord to iterations.jsonl + rewrite state.json
```
The `current_step` field in `state.json` always carries one of:
`Idle`, `AllocateIter`, `CreateWorktree`, `RunSetup`, `BuildPrompt`,
`InvokeAgent`, `CaptureDiff`, `RunTeardown`, `Score`, `Decide`, `Merge`,
`Discard`, `Cleanup`, `Record`, `CheckDeadline`, `Done`.
Each iteration ends in exactly one of these six **outcomes** (the
`outcome` field of an `IterationRecord` in `iterations.jsonl`):
| `merged` | Score improved over the best so far; diff committed on `autorize/<name>`. |
| `discarded` | Agent produced a diff that scored, but the score did not improve. |
| `noop` | Agent produced an empty diff (no changes). Counts toward `max_consecutive_noops`. |
| `invalid` | Scoring failed under `fail_mode = "invalid"`; iteration is discarded, not counted as best. |
| `killed` | Recorded by `autorize resume` for an iteration that was in-flight at crash time. |
| `denied` | Diff touched a `boundaries.deny_paths` pattern; iteration discarded, branch unchanged. |
## 4. Configuration: `.autorize/<name>/config.toml`
Below is the exhaustive schema. All field names, types, defaults, and
validation rules are listed.
### `[experiment]`
| `name` | string | (required) | Must match `[A-Za-z0-9_-]+`. Used as the experiment dir and the `autorize/<name>` branch suffix. |
| `description` | string | `""` | Freeform. |
### `[objective]`
| `command` | string | (required) | Shell command. Run via `bash -lc` inside the iteration's worktree. Must be non-empty. |
| `direction` | enum | (required) | `"min"` or `"max"`. Determines what counts as an improvement. |
| `parse` | table | (required) | See `objective.parse` section below. |
| `timeout` | duration | `"60s"` | humantime duration; how long `objective.command` is allowed to run. |
| `fail_mode` | enum | `"invalid"` | `"invalid"`, `"worst"`, or `"abort"`. See `objective.fail_mode` section below. |
### `[boundaries]`
| `allow_paths` | array of string | `[]` | Glob patterns. **Prompt-only in v1** — included in the agent prompt, not enforced. |
| `deny_paths` | array of string | `[]` | Glob patterns. **Enforced**: an iteration whose diff touches any of these is `denied`. |
### `[setup]`
Run once per iteration, inside the worktree, before `agent.command`.
| `command` | string | `""` | Empty string skips setup. |
| `timeout` | duration | `"5m"` | humantime duration. |
### `[teardown]`
Run once per iteration, inside the worktree, after scoring.
| `command` | string | `""` | Empty string skips teardown. |
| `timeout` | duration | `"1m"` | humantime duration. |
### `[iteration]`
| `budget` | duration | `"5m"` | Hard wall-clock per agent invocation. Must be greater than zero. |
| `max_iterations` | integer | `0` | `0` means unbounded. |
| `keep_worktrees` | bool | `false` | Retain per-iter `wt/` directories under `iter-NNNN/` for debugging. |
| `max_consecutive_noops` | integer | `5` | Loop exits after this many consecutive `noop` outcomes. |
### `[schedule]`
**Set exactly one** of `total_budget` or `deadline`. Validation rejects
both-set or neither-set.
| `total_budget` | duration | (unset) | humantime duration. Deadline computed as `now + total_budget` at first `run`. |
| `deadline` | string | (unset) | See `schedule` grammar below for accepted forms. |
### `[agent]`
| `command` | string | (required) | Shell command. Substitutions: `{prompt_file}`, `{workdir}`, `{iter}`. Must contain `{prompt_file}` when `stdin = "none"`. Run via `bash -lc`. |
| `workdir_var` | string | `"AUTORIZE_WORKDIR"` | Name of the env var injected into the agent process containing the absolute path of the iteration's worktree. |
| `stdin` | enum | `"none"` | `"none"`: nothing piped on stdin; the command **must** contain `{prompt_file}`. `"prompt"`: the prompt file contents are piped on stdin. |
### `[agent.env]`
A sub-table mapping environment variable name to string value. The value is
expanded for `$NAME` / `${NAME}` references against the parent process
environment **before** being passed to the agent.
| `ANTHROPIC_API_KEY` | string | (none) | Example entry in the default template — passes the parent env's `$ANTHROPIC_API_KEY` through. |
| (any name) | string | (none) | Any user-defined env var; values with `$VAR` / `${VAR}` are expanded from the parent env. |
## 5. `objective.parse` variants
All three accept input from the scoring command's stdout.
```toml
# Raw float: the entire stdout (trimmed) is parsed as a float.
parse = { kind = "float" }
```
```toml
# Regex: the first capture group of the first match is parsed as a float.
# The pattern must be non-empty and must contain a capture group.
parse = { kind = "regex", pattern = "score=([0-9.]+)" }
```
```toml
# JSON path: stdout must be valid JSON; the value at the path must be a
# scalar number. Accepts jq-style leading dot (".foo.bar") or JSONPath
# ("$.foo.bar"). The path must be non-empty.
parse = { kind = "jq", path = ".metrics.bpb" }
```
## 6. `objective.fail_mode` semantics
| `"invalid"` | Record the iteration with `outcome = "invalid"`; no score, no best update. |
| `"worst"` | Treat as the worst possible score: `f64::MAX` when `direction = "min"`, `f64::MIN` when `direction = "max"`. These finite sentinels round-trip through JSON (unlike `+inf` / `-inf`, which serde serializes as `null`). Counts as a real (terrible) score. |
| `"abort"` | Stop the whole `autorize run` with an error. |
## 7. `boundaries.deny_paths` vs `boundaries.allow_paths`
- `deny_paths` is a list of glob patterns (globset syntax). After the agent
runs, `git add -A` stages all changes (including new files) in the
worktree, then `git diff <branch>` is computed. If any changed path
matches any deny pattern, the outcome is `denied` and the iteration is
thrown away — the tracking branch is **not** advanced and scoring is
skipped.
- `allow_paths` is **prompt-only in v1**: the patterns are included in the
agent's prompt as a constraint hint, but autorize does not enforce them
via the diff. Use `deny_paths` if you need enforcement.
## 8. `schedule` grammar
`schedule.deadline` (when used instead of `schedule.total_budget`) accepts
three forms:
| humantime duration | `"4h"`, `"30m"`, `"1d"` | Equivalent to `total_budget`: now + duration. |
| RFC3339 absolute time | `"2026-05-21T09:00:00-07:00"` | Parsed as an absolute UTC instant. |
| natural language | `"tomorrow"`, `"today 3pm"`, `"tomorrow 9am"`, `"tomorrow 14:30"`, `"9am"` | Local-time clock. A bare time like `"9am"` rolls to tomorrow if it is already past today. `"12am"` is midnight, `"12pm"` is noon. |
`schedule.total_budget` only accepts humantime durations.
## 9. `agent.command` substitutions, env expansion, stdin modes
**Command substitutions** (literal token replacement in the
`agent.command` string before it is handed to `bash -lc`):
| `{prompt_file}` | Absolute path to `iter-NNNN/prompt.md`. |
| `{workdir}` | Absolute path to the iteration's worktree. |
| `{iter}` | Decimal iteration number (1-based). |
**Env expansion** for `agent.env` values:
- `$NAME` and `${NAME}` are expanded against the parent process
environment. Names match `[A-Za-z_][A-Za-z0-9_]*`.
- Unset variables expand to the empty string.
- A literal `$` followed by a non-name character is preserved verbatim
(so `"price $5"` stays `"price $5"`).
- The parent environment is passed through automatically; `agent.env`
values overlay on top. The variable named by `agent.workdir_var`
(default `AUTORIZE_WORKDIR`) is always injected with the worktree path.
**`agent.stdin` modes**:
- `"none"` (default): nothing is piped on stdin; the command **must**
contain `{prompt_file}` so the agent can find its instructions. This is
enforced by config validation.
- `"prompt"`: the contents of `iter-NNNN/prompt.md` are piped on stdin.
`{prompt_file}` is not required in this mode.
**Wall-clock kill**: every agent invocation is run via `setsid` so it has
its own process group. On `iteration.budget` expiry the harness sends
`SIGTERM` to the whole group, waits up to 5 seconds, then `SIGKILL`s the
group. This reaches grandchildren that a plain `kill(pid)` would orphan.
The `IterationRecord.agent_killed_by_budget` field is set to `true` for
killed iterations.
## 10. On-disk layout
```
<repo>/
.autorize/<name>/
config.toml # the schema documented above
program.md # freeform agent instructions
state.json # atomic checkpoint of loop state
iterations.jsonl # durable append-only log, one JSON object per line
run.lock # advisory flock held by the active `autorize run`; contains its pid
iter-0001/
prompt.md # full prompt the agent saw
changes.diff # captured diff vs autorize/<name>
agent.stdout
agent.stderr
wt/ # the worktree (only if iteration.keep_worktrees = true)
iter-0002/
...
```
- `state.json` is written via tmp-file + fsync + atomic rename (and best-
effort directory fsync). A torn write never corrupts the destination.
- `iterations.jsonl` is opened with `O_APPEND` and `fsync`'d after every
record. The reader tolerates a torn last line (drops it silently); a
corrupt non-last line is an error.
- The tracking branch `autorize/<name>` records every merged iteration as
a separate commit. `git log autorize/<name>` is the improvement history;
`git diff <base>..autorize/<name>` is the cumulative change since the
experiment started.
## 11. `IterationRecord` and `StateSnapshot` schemas
Each line in `iterations.jsonl` is one `IterationRecord` JSON object:
| `iter` | integer (u64) | 1-based iteration number. Strictly increasing. |
| `started_at` | RFC3339 timestamp | When the iteration began. |
| `ended_at` | RFC3339 timestamp | When the iteration finished (regardless of outcome). |
| `outcome` | string | One of `"merged"`, `"discarded"`, `"noop"`, `"invalid"`, `"killed"`, `"denied"`. |
| `score` | float or null | Parsed score, when scoring ran and succeeded. |
| `best_so_far` | float or null | Best score across all previous merged iterations (after this one updates it). |
| `agent_exit` | integer or null | Exit code of the agent process. `null` when killed by signal or unable to spawn. |
| `agent_killed_by_budget` | bool | `true` if the wall-clock budget killed the agent process group. |
| `diff_lines` | integer (u64) | Line count of `iter-NNNN/changes.diff`. |
| `notes` | string | Free-form, set by the harness in special cases (e.g. `"resumed after crash"`). |
`state.json` is a single `StateSnapshot` JSON object:
| `experiment` | string | The experiment `name`. |
| `branch` | string | The tracking branch (`autorize/<name>`). |
| `base_commit` | string | SHA at which the tracking branch was created. The loop refuses to continue if it is gone. |
| `iter_in_progress` | integer or null | The in-flight iteration number, or `null` when idle between iterations. |
| `current_step` | enum (string) | One of the `CurrentStep` variants listed in section 3. |
| `best_score` | float or null | Best score seen so far. |
| `best_iter` | integer or null | Iteration number whose merge set `best_score`. |
| `started_at` | RFC3339 timestamp | When the run loop first started this experiment. |
| `deadline` | RFC3339 timestamp | Absolute UTC deadline computed from `schedule`. |
| `iterations_completed` | integer (u64) | Total number of records in `iterations.jsonl`. |
| `consecutive_noops` | integer (u32) | Streak length of consecutive `noop` outcomes; resets on any non-noop. |
## 12. Pre-flight checks performed by `autorize run`
Before entering the loop, `autorize run`:
- Verifies the experiment directory exists (created by `autorize init`).
- Acquires an exclusive non-blocking advisory flock on
`.autorize/<name>/run.lock`. A second concurrent `autorize run` on the
same experiment is rejected immediately with the holder's pid for
diagnostics. The kernel releases the lock automatically on process exit,
so a crash leaves no stale lock to clean up.
- Verifies the current directory is a git repository.
- Verifies the working tree is clean (excluding the `.autorize/` directory,
which is always allowed to be dirty). Use `--allow-dirty` to bypass.
- If `state.json` exists, verifies `state.json.base_commit` is reachable
in the current repo. If it is gone, the run aborts with an error.
- If `state.json` exists and has `iter_in_progress != null`, the run is
refused with a message pointing at `autorize resume <name>`. `resume`
records the in-progress iter as `outcome = "killed"`, clears the
in-progress marker, and continues the loop.
`autorize run --allow-dirty <name>` overrides only the dirty-tree check.
All other pre-flight checks still apply.
## 13. Walkthrough: `examples/pi-digits/`
A complete inline example. The fixture nudges the single floating-point
number in `value.txt` toward π (3.141592653589793) over a handful of
iterations.
### Scaffold
```sh
autorize init pi
```
Creates:
```
.autorize/pi/
config.toml
program.md
```
### Edited `config.toml`
```toml
[experiment]
name = "pi"
description = "Demo: nudge value.txt toward π."
[objective]
command = "bash score.sh"
direction = "min"
parse = { kind = "float" }
timeout = "30s"
fail_mode = "invalid"
[boundaries]
allow_paths = ["value.txt"]
deny_paths = [".autorize/**", "*.lock"]
[setup]
command = ""
timeout = "1m"
[teardown]
command = ""
timeout = "1m"
[iteration]
budget = "30s"
max_iterations = 6
keep_worktrees = false
max_consecutive_noops = 5
[schedule]
total_budget = "5m"
[agent]
command = "bash mock-agent.sh {iter}"
workdir_var = "AUTORIZE_WORKDIR"
stdin = "prompt"
[agent.env]
```
### `program.md`
```
# pi experiment
Your job is to nudge the single floating-point number in `value.txt` closer to
π (3.141592653589793).
Constraints:
- Only modify `value.txt`. Do not create or modify any other files.
- Do not touch anything under `.autorize/` — that is the harness's bookkeeping.
- Keep the file as a single line containing a decimal number followed by `\n`.
The harness scores each iteration by computing `|π − value|` (lower is better)
and keeps your edit only if the score improves over the best known so far.
```
### `autorize run pi` (sample output)
```
iter 1: merged score=0.099201 best=0.099201
iter 2: merged score=0.069441 best=0.069441
iter 3: merged score=0.048608 best=0.048608
iter 4: discarded score=0.534008 best=0.048608
iter 5: merged score=0.034025 best=0.034025
iter 6: merged score=0.023818 best=0.023818
reached max_iterations=6; stopping.
---
experiment pi
iterations 6
best iter 6, score 0.023818
```
### Annotated `iterations.jsonl` line
```json
{
"iter": 1,
"started_at": "2026-05-20T08:00:00.000000Z",
"ended_at": "2026-05-20T08:00:01.234567Z",
"outcome": "merged",
"score": 0.099201,
"best_so_far": 0.099201,
"agent_exit": 0,
"agent_killed_by_budget": false,
"diff_lines": 4,
"notes": ""
}
```
- `outcome: "merged"` means the diff was committed onto `autorize/pi`.
- `best_so_far` equals `score` because this is the first record.
- `agent_killed_by_budget: false` means the agent finished inside
`iteration.budget`.
### `autorize status pi` (sample output)
```
experiment pi
branch autorize/pi
base_commit abc1234deadbeef...
iterations 6
noop streak 0
last outcome merged
best iter 6, score 0.023818
elapsed 1s
remaining 4m 58s
```
### Simulated crash + resume
Suppose the harness was killed mid-iter at iter 3. `state.json` looks like:
```json
{
"experiment": "pi",
"branch": "autorize/pi",
"base_commit": "abc1234deadbeef...",
"iter_in_progress": 3,
"current_step": "InvokeAgent",
"best_score": 0.069441,
"best_iter": 2,
"started_at": "2026-05-20T08:00:00Z",
"deadline": "2026-05-20T08:05:00Z",
"iterations_completed": 2,
"consecutive_noops": 0
}
```
`autorize run pi` refuses with:
```
in-progress iteration found; use `autorize resume`
```
`autorize resume pi` records iter 3 as `outcome: "killed"`:
```json
{
"iter": 3,
"started_at": "2026-05-20T08:00:30.000000Z",
"ended_at": "2026-05-20T08:00:30.000000Z",
"outcome": "killed",
"score": null,
"best_so_far": 0.069441,
"agent_exit": null,
"agent_killed_by_budget": false,
"diff_lines": 0,
"notes": "resumed after crash"
}
```
…and then continues the loop at iter 4 as if `autorize run` had been
invoked.
---
End of `autorize llms` reference. Source-of-truth modules live under
`src/` (`src/config.rs`, `src/scoring.rs`, `src/schedule.rs`,
`src/agent.rs`, `src/storage.rs`, `src/iteration.rs`, `src/cli/run.rs`)
if you need to read the code.