galdr 0.13.0

Record & Replay for agent skills — capture a session's tool calls and distill them into a reproducible skill. Local-first.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
<p align="center">
  <img src="assets/banner.svg" alt="galdr — Record & Replay for agent skills" width="100%">
</p>

<p align="center">
  <a href="https://crates.io/crates/galdr"><img src="https://img.shields.io/crates/v/galdr.svg" alt="crates.io"></a>
  <a href="https://github.com/Arakiss/galdr/actions/workflows/ci.yml"><img src="https://github.com/Arakiss/galdr/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <img src="https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg" alt="License: MIT OR Apache-2.0">
  <img src="https://img.shields.io/badge/rust-1.88%2B-orange.svg" alt="Rust 1.88+">
  <img src="https://img.shields.io/badge/egress-none%20·%20loopback--only-4FD6C9.svg" alt="No external egress, loopback-only">
</p>

# galdr

> _galdr_ — Old Norse for a chanted spell: a sequence performed once and sung again.

**Record & Replay for agent skills.** galdr records the *tool calls* an agent harness
already emits — not pixels, not the screen — stores them as a raw, immutable span, and
distills them into a reproducible skill. Everything is local: the raw lives in
`~/.galdr` and nothing leaves the machine.

The idea: instead of re-explaining to your agent how to do a task it already did well,
**record it once** and turn it into a skill it can replay with judgment.

<p align="center">
  <img src="assets/demo.gif" alt="galdr: record a task, distill it into a skill, surface opportunities, measure replay reliability" width="100%">
</p>

> [!NOTE]
> Around the core loop (record → distill → replay) galdr adds: a SQLite catalog (a
> rebuildable *index*, never the truth) with a supervisor daemon, an install-time
> content gate (blocks secrets, personal paths, dangerous commands), a terminal UI,
> safe export, diff-based **parametrization** of two recordings, **`galdr suggest`**
> (repeated tasks worth turning into a skill), **`galdr bench`** (per-skill replay
> hit-rate from recorded outcomes), and optional autonomous distillation against a
> local, loopback-only MLX server. A plain `cargo build` needs none of the MLX path —
> it is feature-gated and falls back cleanly to the faithful mechanical render.

## Why tool calls, not pixels

A GUI Record & Replay records what the screen looked like. It breaks when a button
moves. An agent already emits a clean, structured trace of *what it did*: each tool
call, its input, and its result. galdr records that substrate. The replay is not a
pixel re-enactment — it is a skill the agent reads and applies with judgment.

**The honest scope.** galdr records what *your agent* did, not what *you* did by hand
outside it. Note this is narrower than it sounds: when the agent drives a browser
through a tool — a Playwright/Chrome MCP server, a browser tool — those clicks, types,
and navigations **are** tool calls, so galdr already captures and distills them like
any other step. The only thing out of scope is capturing a *human's* manual gestures
in a browser the agent never touched; that would need a separate pixel/DOM capture
layer and contradicts the "tool calls, not pixels" thesis, so it stays out of the core
(a roadmap item, and a job for the extension seam if ever). Sold honestly: this is
Record & Replay *for what your agent does* — including its web automation and the GUI
work it drives through Computer Use (those actions are tool calls too; galdr keeps the
action and drops the screenshot).

## How it works

```
agent session ──(PostToolUse)──▶ galdr hook ──append──▶ span (JSONL)
                                            galdr distill ◀───┘
                                      ~/.agents/skills/<name>/SKILL.md
```

1. **Sensor** (`galdr hook`) — invoked by the harness after each tool call. If a
   recording is active, it appends the event to the span. It is instantaneous and
   **always exits 0**: it never breaks the session, even if it fails internally.
2. **Recording** (`galdr rec start` / `stop`) — opens and closes the span, and writes
   the recording metadata.
3. **Distillation** (`galdr distill [reference]`) — a replay of the tool calls is not yet
   a skill. galdr renders a faithful draft from the span (real steps, secrets redacted)
   and hands the agent an authoring brief: read the span, supply the judgment galdr can't
   — the problem it solves, the inputs that vary, each step's intent, the gotchas — and
   install the authored skill with `--from`. galdr owns the mechanism; the agent owns the
   intelligence. `--fast` installs the mechanical render as-is in one step (a floor, for a
   human or a headless run); `--auto` lets a local MLX model author it.

## Install

```sh
# from crates.io
cargo install galdr

# or from source
cargo install --git https://github.com/Arakiss/galdr
```

Prefer a prebuilt binary? Every [release](https://github.com/Arakiss/galdr/releases/latest)
ships signed, checksummed binaries (Sigstore + SHA-256) and an SBOM for macOS and Linux
(arm64 + x86_64). No network egress at runtime — galdr only ever talks to loopback.

## Quickstart

```sh
galdr setup skill         # teach your harness(es) how to drive galdr (one time)

galdr rec start demo      # ● recording "demo" — now do the task with your agent
#  ... a few tool calls ...
galdr rec stop            # ■ stopped "demo" — 6 steps

galdr distill             # render a faithful draft of the most recent recording + an authoring brief
#  ... read the span, write the real skill, then ...
galdr distill --from skill.md   # install the skill you authored
```

galdr captures *what* ran; you supply *why*. The default hands you a faithful draft and a
brief — install your authored version with `--from`, or `galdr distill --fast` to accept
the mechanical render as-is (a floor, not a finished skill).

**You never copy a 26-character id.** Every command that takes a recording resolves it
the way you think about it — the most recent by default, or by name, or by a short id
prefix:

```sh
galdr distill demo        # draft the recording named "demo"
galdr show                # inspect the most recent recording's steps
galdr suggest             # repeated tasks worth turning into a skill
galdr bench               # how reliably your distilled skills actually replay
```

Run `galdr` with no arguments for a one-screen overview: whether anything is recording,
how many recordings and skills you have, and the next command to type.

## More commands

```sh
galdr daemon --detach        # run the supervisor daemon (catalog indexer + socket)
galdr daemon status          # check whether the daemon is answering
galdr daemon stop            # ask the daemon to shut down gracefully
galdr show [reference]       # inspect one recording (name, id prefix, or omit for the latest)
galdr skills                 # list installed skills, galdr/external origin, provenance, and readiness
galdr harnesses              # detect agent harnesses on this system and whether galdr's sensor is wired
galdr harnesses --json       # the same, machine-readable
galdr link                   # make distilled skills discoverable by every installed harness
galdr link --skill <name>    # link just one skill
galdr evaluations            # list skill evaluator outputs from the catalog
galdr evaluations --skill <name>   # show one skill's evaluator history
galdr outcome usage --skill <name> --outcome success   # --rec defaults to the latest recording
galdr outcome label --skill <name> --label accepted --evaluator human
galdr outcome list --skill <name>  # inspect captured usage/outcome labels
galdr suggest                # skill opportunities: repeated tasks not yet distilled
galdr suggest --min-count 1  # also surface single, undistilled recordings
galdr bench                  # replay reliability: per-skill hit-rate from recorded outcomes
galdr reindex                # rebuild the SQLite catalog from disk
galdr doctor                 # diagnose config, catalog, daemon, skills, and hook wiring
galdr setup claude --check   # check Claude Code PostToolUse hook wiring
galdr setup claude --print   # print the safe settings.json snippet
galdr setup codex --check    # check Codex PostToolUse hook wiring (~/.codex/hooks.json)
galdr setup codex --print    # print the safe Codex hooks snippet
galdr tui                    # lazygit-style browser: recordings, spans, skills, harnesses

galdr diff <a> <b>           # diff two recordings: constants vs parameters
galdr parametrize <a> <b> --emit   # write a parametrized SKILL.md (suffix -param)
galdr export [reference] --out ./export          # export metadata + summaries, no raw payloads
galdr export [reference] --out ./export --redact   # export a redacted raw copy

galdr distill --auto         # autonomous distillation of the latest recording (local MLX, see below)
```

galdr is two surfaces over one catalog, and **both serve humans and agents**. The CLI
gives a person warm, colorized output and a no-id-needed flow (resolve a recording by
the latest, a name, or a short prefix; `NO_COLOR` and non-TTY output are honored); it
gives an agent `--json` on every read command — `list`, `show`, `skills`, `evaluations`,
`harnesses`, `outcome list` — emitting a single parseable document, so a tool consumes
galdr without scraping a table:

```sh
galdr list --json | jq '.[].rec_id'
galdr skills --json | jq '[.[] | select(.origin == "galdr")]'
```

### Terminal UI

`galdr tui` is the browsable face — a lazygit-style cockpit over the same catalog, no
flags to memorize. Three panels (Recordings · Skills · Harnesses), moved with `1`/`2`/`3`
or `tab`, and a live preview that follows the selection:

- **Recordings**`enter` steps into the span; `d` distills a complete skill; `e`
  exports it without raw payloads; `o` reveals the span path.
- **Skills**`enter` reads the `SKILL.md` (scrollable); `l` links it into every
  installed harness; `v` validates it against the content gate; `O` records a success
  outcome (the signal `galdr bench` reads).
- `/` filters recordings and skills, `?` lists the keybindings, `q` quits.

Every action the TUI takes is **local and additive** — it distills, links, validates,
exports, and records outcomes. It never deletes a span, rewrites a skill behind your
back, or mutates your harness settings.

The daemon is optional. `list`/`show`/`skills` answer daemon-first, then from the
read-only catalog, then from a fresh in-memory index built straight from disk — so the
CLI works whether or not the daemon is running. Write paths also keep the local catalog
fresh best-effort, so a closed recording or newly installed skill is visible without
waiting for a daemon process.

`galdr doctor` checks the local root, config, daemon socket, active recording, rebuildable
catalog, skill provenance, draft skills, and Claude Code hook wiring. `galdr setup claude
--print` emits the safe hook snippet; it never mutates your settings file.

`galdr skills` is a small skill catalog, not just a name list. It reports each
skill's provenance, lifecycle status (`draft`, `final`, `param-draft`, `unknown`), a
simple 0-100 readiness score, and the latest delta versus the previously indexed
version. The score is intentionally explainable: frontmatter, required sections, draft
markers, and provenance. It is a lint/readiness guardrail, not a claim that the skill is
objectively good.

Evaluator outputs are stored separately from the skill row and can be inspected with
`galdr evaluations`. Today's built-in evaluator is `readiness_lint`; future evaluators
can add human review, LLM review, outcome evidence, or a learned model without mixing
those signals into one opaque "quality" number.

`galdr outcome` is the supervised-data lane. It records append-only local JSONL events
under `~/.galdr/outcomes/` and indexes them into the catalog:

- `galdr outcome usage` records that a skill was used in a later recording, with task
  kind, outcome, retry count, manual intervention count, and notes.
- `galdr outcome label` records explicit labels or reviews such as `accepted`,
  `rejected`, `needs_review`, `regression`, or any project-specific label.
- `galdr outcome list` shows the captured usage and labels.

This keeps learned-model training out of the agent loop while collecting the labels a
future offline classifier or ranker needs: skill content/hash, provenance, task context,
observed outcome, retries, interventions, and reviewer labels.

### Optional: autonomous distillation (local MLX)

`galdr distill --auto` (optionally with a recording reference) lets a local model write
the finished skill instead of the agent. It is **off by default** and **loopback-only** —
it never reaches off the machine.

1. Build with the feature: `cargo install --path . --features mlx`.
2. Run a local OpenAI-compatible server, e.g. `mlx_lm.server` from
   [mlx-lm]https://github.com/ml-explore/mlx-lm:
   `python3 -m mlx_lm.server --model mlx-community/Qwen3-4B-Instruct-2507-4bit`
   (the default endpoint `http://127.0.0.1:8080` and model are configurable in
   `~/.galdr/config.json`).
3. `galdr distill --auto` (or `--engine mlx-subprocess` to shell out to
   `python3 -m mlx_lm.generate`, which needs no `mlx` feature).

If the engine is missing or unreachable, `--auto` falls back to the deterministic
complete skill and still exits 0. Always review a machine-generated skill before relying
on it.

## Wiring the sensor

For the sensor to receive events, the harness must invoke `galdr hook` in its
*after-each-tool* hook, passing the event on stdin. In Claude Code, add a `PostToolUse`
entry to `~/.claude/settings.json` — hook arrays are concatenated, so it coexists with
any other hooks you have:

```json
{
  "hooks": {
    "PostToolUse": [
      {
        "hooks": [
          { "type": "command", "command": "galdr hook" }
        ]
      }
    ]
  }
}
```

The sensor reads the harness fields from stdin (`tool_name`, `tool_input`,
`tool_response`, `cwd`, `session_id`, `transcript_path`) and is a no-op when no
recording is active.

If you prefer a resilient command that finds `galdr` on `PATH` and falls back to the
cargo bin, that form works too and is recognized by `galdr doctor` / `setup claude
--check`:

```sh
if command -v galdr >/dev/null 2>&1; then galdr hook; \
elif [ -x "$HOME/.cargo/bin/galdr" ]; then "$HOME/.cargo/bin/galdr" hook; fi
```

## Multi-harness: one skill, every harness

galdr distills a skill once, into the open-standard skills root (`~/.agents/skills`),
then makes it discoverable in **every harness installed on the machine**. Each harness
loads skills from its own directory, so galdr links the canonical skill into each one:

| Harness | Skills directory | Sensor wiring |
|---|---|---|
| Claude Code | `~/.claude/skills` | `~/.claude/settings.json` PostToolUse hook |
| Codex | `~/.codex/skills` | `~/.codex/hooks.json` (same hook shape) |
| Cursor | `~/.cursor/skills-cursor` ||

The link is a symlink back to the canonical copy (the same mechanism a hand-linked
skill already uses), created on install, never clobbering a real file of the same
name. `galdr harnesses` shows what's installed; `galdr link` (re)links every skill;
`galdr doctor` flags any galdr skill a harness can't see. The result: record a task in
one harness, get a reusable skill in all of them.

## On-disk layout

```
~/.galdr/
├── active                      active recording (JSON); absent = not recording
├── config.json                 optional config (distill engine, endpoint, model)
├── galdrd.sock                 daemon control socket (NDJSON over a Unix socket)
├── galdrd.pid                  daemon pidfile
├── catalog.sqlite              queryable index, rebuilt from spans/ + recordings/ + skills
├── spans/<rec_id>.jsonl        append-only span, one JSON line per tool call
├── outcomes/skill_usage.jsonl  append-only skill usage observations
├── outcomes/skill_outcomes.jsonl append-only skill labels and reviews
└── recordings/<rec_id>.json    metadata for each closed recording
```

The span is the raw source of truth: append-only, immutable, inspectable. The SQLite
catalog is an **index, never the truth** — it stores one-line step summaries (no raw
blobs), readiness/evaluation rows, and outcome indexes. `galdr reindex` rebuilds it from
spans, recordings, skills, and outcome logs at any time.

The root is `~/.galdr` by default. Set `GALDR_ROOT` to relocate it (and
`GALDR_SKILLS_ROOT` to relocate the skills directory) — useful for hermetic tests,
throwaway profiles, CI, or to keep the daemon's Unix socket path under the platform's
length limit.

`config.json` may also include optional capture policy for future recordings:

```json
{
  "capture": {
    "deny_tools": ["SecretTool"],
    "deny_cwd_prefixes": ["/private/project"],
    "max_response_chars": 4000,
    "strip_screenshots": true
  }
}
```

The policy only applies to new events as the hook records them. It never edits spans
that already exist.

`strip_screenshots` (on by default) drops base64 image blobs — a Computer Use
screenshot, an image content block — from the recorded event, keeping the *action*
(click, type, key) but not the pixels. The pixels are large and may show sensitive
on-screen content, and they are never the reusable signal; the action is. This is what
lets galdr record a GUI task the agent drives via Computer Use and distill it into a
clean, semantic skill — the agent's GUI work is just more tool calls.

## Extension points

The core exposes two generic seams (`src/ext.rs`) and nothing more:

- **`PermissionGate`** — decides whether an event may be recorded. Allows everything by
  default.
- **`ProvenanceSink`** — observes recorded events. Does nothing by default.

Any concrete integration (permission policy, traceability) plugs in by implementing
these traits from outside the repository.

## Security & privacy

The span captures each tool's `tool_input` and `tool_response`, which **may contain
sensitive data** (file contents, commands, outputs). Keep that in mind before recording
and before sharing a recording or a distilled skill. See [SECURITY.md](SECURITY.md).

Design guarantees:

- The raw lives **only** in `~/.galdr`, on your machine. galdr makes **no external
  network egress**. The one optional exception — autonomous distillation (`--auto`,
  feature `mlx`) — talks **only to loopback**, enforced in code by
  `engine::validate_loopback`; a non-loopback endpoint is a hard error.
- The sensor never propagates errors to the agent session, and never depends on the
  daemon: it appends to the span first, then hints the daemon best-effort.
- The autonomous distiller treats the recorded span as untrusted data (delimiter, low
  temperature, output validation, human review). Default distillation is still done by
  your own agent, locally.

## Roadmap

Phase 1 (shipped, built on the Phase 0 loop):

- **Supervisor daemon** + Unix socket + **SQLite** as a rebuildable catalog.
-**TUI** to browse recordings, inspect spans, and audit skill provenance.
-**Diff-based parametrization** of two recordings (constant = steps, variable = inputs).
-**Autonomous distillation** against a local, loopback-only MLX server.
-**Skill catalog readiness signals** with lifecycle status, provenance, score, delta,
  and an evaluator table for future human/LLM/model reviews.
-**Operational diagnostics** (`doctor`, `rec status`, daemon status/stop, Claude setup
  check/print).
-**Safe export path** that omits raw payloads by default and can emit redacted raw
  copies without touching the original span.

Phase 2 (shipped):

- **Faithful render**`galdr distill` renders a valid skill from the span in the
  open-standard `When to use` / `Inputs` / `Steps` / `Verification` anatomy. `--fast`
  installs it as-is; `--auto` lets a local MLX model author it.
-**Multi-harness discoverability** — a distilled skill is linked into every installed
  harness's skills directory (Claude Code, Codex, Cursor); `galdr link` / `doctor` manage it.
-**Multi-harness sensor**`galdr setup codex` wires the same hook into Codex's
  `hooks.json`; `galdr harnesses` shows which harnesses are wired.
-**Session-scoped recording** — a recording binds to the session that started it, so a
  concurrent session in another project can't leak its tool calls into the span.
-**AI-first CLI**`--json` on every read command.

Phase 3 (shipped) — a surface humans and agents both want to use:

- **Author by default** — a replay of the tool calls is not a skill. `galdr distill`
  now renders a faithful draft and hands the agent an authoring brief (supply the why,
  the generalized inputs, the gotchas), installed with `--from`; an unauthored draft
  scores lower until it is raised. `--fast` keeps the mechanical one-shot for a floor.
-**No ids to copy** — every command resolves a recording by the most recent, a name,
  or a short id prefix. `galdr distill` distills the run you just made.
-**A friendly home screen**`galdr` with no arguments shows where you are and the
  next command to type; output is colorized, TTY-aware, and honors `NO_COLOR`.
-**`galdr suggest`** — signs every recording by the shape of its steps, dedupes
  against installed skills, and ranks the repeated tasks worth distilling.
-**`galdr bench`** — aggregates recorded outcomes into a per-skill replay hit-rate
  and effort cost; the production signal a capability test cannot give.
-**A lazygit-style TUI** — three panels, a live preview, and in-place actions
  (distill, link, validate, export, record outcome), all local and additive.

Next:

- **Capture of human GUI gestures** — the deliberate scope gap above. The agent's *own*
  browser automation is already captured (it arrives as tool calls). What's missing is
  recording a *human* driving a browser by hand, which needs a separate pixel/DOM capture
  layer on top of the span model — the one axis where Codex's pixel recorder does something
  galdr does not, and a job for the extension seam rather than the core.
- Verify the Codex sensor end to end with a live Codex recording (the wiring is in place;
  the stdin payload is not yet confirmed field-for-field).
- A multi-agent broker over the same span model.
- Real gates and real provenance plugged into the extension layer (`PermissionGate`,
  `ProvenanceSink`) — where harness-specific policy or memory integrations live, kept out
  of the local-first core.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Commits follow
[Conventional Commits](https://www.conventionalcommits.org/); the project uses
[Semantic Versioning](https://semver.org/).

## License

Licensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or the
[MIT license](LICENSE-MIT) at your option — the standard Rust dual-license, so you
pick whichever fits.

Unless you explicitly state otherwise, any contribution intentionally submitted for
inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual
licensed as above, without any additional terms or conditions.