galdr
galdr — Old Norse for a chanted spell: a sequence performed once and sung again.
Record & Replay for agent skills. galdr records the tool calls an agent harness
already emits — not pixels, not the screen — stores them as a raw, immutable span, and
distills them into a reproducible skill. Everything is local: the raw lives in
~/.galdr and nothing leaves the machine.
The idea: instead of re-explaining to your agent how to do a task it already did well, record it once and turn it into a skill it can replay with judgment.
[!NOTE] Around the core loop (record → distill → replay) galdr adds: a SQLite catalog (a rebuildable index, never the truth) with a supervisor daemon, an install-time content gate (blocks secrets, personal paths, dangerous commands), a terminal UI, safe export, diff-based parametrization of two recordings,
galdr suggest(repeated tasks worth turning into a skill),galdr bench(per-skill replay hit-rate from recorded outcomes), and optional autonomous distillation against a local, loopback-only MLX server. A plaincargo buildneeds none of the MLX path — it is feature-gated and falls back cleanly to the faithful mechanical render.
Why tool calls, not pixels
A GUI Record & Replay records what the screen looked like. It breaks when a button moves. An agent already emits a clean, structured trace of what it did: each tool call, its input, and its result. galdr records that substrate. The replay is not a pixel re-enactment — it is a skill the agent reads and applies with judgment.
The honest scope. galdr records what your agent did, not what you did by hand outside it. Note this is narrower than it sounds: when the agent drives a browser through a tool — a Playwright/Chrome MCP server, a browser tool — those clicks, types, and navigations are tool calls, so galdr already captures and distills them like any other step. The only thing out of scope is capturing a human's manual gestures in a browser the agent never touched; that would need a separate pixel/DOM capture layer and contradicts the "tool calls, not pixels" thesis, so it stays out of the core (a roadmap item, and a job for the extension seam if ever). Sold honestly: this is Record & Replay for what your agent does — including its web automation and the GUI work it drives through Computer Use (those actions are tool calls too; galdr keeps the action and drops the screenshot).
How it works
agent session ──(PostToolUse)──▶ galdr hook ──append──▶ span (JSONL)
│
galdr distill ◀───┘
│
▼
~/.agents/skills/<name>/SKILL.md
- Sensor (
galdr hook) — invoked by the harness after each tool call. If a recording is active, it appends the event to the span. It is instantaneous and always exits 0: it never breaks the session, even if it fails internally. - Recording (
galdr rec start/stop) — opens and closes the span, and writes the recording metadata. - Distillation (
galdr distill [reference]) — a replay of the tool calls is not yet a skill. galdr renders a faithful draft from the span (real steps, secrets redacted) and hands the agent an authoring brief: read the span, supply the judgment galdr can't — the problem it solves, the inputs that vary, each step's intent, the gotchas — and install the authored skill with--from. galdr owns the mechanism; the agent owns the intelligence.--fastinstalls the mechanical render as-is in one step (a floor, for a human or a headless run);--autolets a local MLX model author it.
Install
# from crates.io
# or from source
Prefer a prebuilt binary? Every release ships signed, checksummed binaries (Sigstore + SHA-256) and an SBOM for macOS and Linux (arm64 + x86_64). No network egress at runtime — galdr only ever talks to loopback.
Quickstart
# ... a few tool calls ...
# ... read the span, write the real skill, then ...
galdr captures what ran; you supply why. The default hands you a faithful draft and a
brief — install your authored version with --from, or galdr distill --fast to accept
the mechanical render as-is (a floor, not a finished skill).
You never copy a 26-character id. Every command that takes a recording resolves it the way you think about it — the most recent by default, or by name, or by a short id prefix:
Run galdr with no arguments for a one-screen overview: whether anything is recording,
how many recordings and skills you have, and the next command to type.
More commands
galdr is two surfaces over one catalog, and both serve humans and agents. The CLI
gives a person warm, colorized output and a no-id-needed flow (resolve a recording by
the latest, a name, or a short prefix; NO_COLOR and non-TTY output are honored); it
gives an agent --json on every read command — list, show, skills, evaluations,
harnesses, outcome list — emitting a single parseable document, so a tool consumes
galdr without scraping a table:
|
|
Terminal UI
galdr tui is the browsable face — a lazygit-style cockpit over the same catalog, no
flags to memorize. Three panels (Recordings · Skills · Harnesses), moved with 1/2/3
or tab, and a live preview that follows the selection:
- Recordings —
entersteps into the span;ddistills a complete skill;eexports it without raw payloads;oreveals the span path. - Skills —
enterreads theSKILL.md(scrollable);llinks it into every installed harness;vvalidates it against the content gate;Orecords a success outcome (the signalgaldr benchreads). /filters recordings and skills,?lists the keybindings,qquits.
Every action the TUI takes is local and additive — it distills, links, validates, exports, and records outcomes. It never deletes a span, rewrites a skill behind your back, or mutates your harness settings.
The daemon is optional. list/show/skills answer daemon-first, then from the
read-only catalog, then from a fresh in-memory index built straight from disk — so the
CLI works whether or not the daemon is running. Write paths also keep the local catalog
fresh best-effort, so a closed recording or newly installed skill is visible without
waiting for a daemon process.
galdr doctor checks the local root, config, daemon socket, active recording, rebuildable
catalog, skill provenance, draft skills, and Claude Code hook wiring. galdr setup claude --print emits the safe hook snippet; it never mutates your settings file.
galdr skills is a small skill catalog, not just a name list. It reports each
skill's provenance, lifecycle status (draft, final, param-draft, unknown), a
simple 0-100 readiness score, and the latest delta versus the previously indexed
version. The score is intentionally explainable: frontmatter, required sections, draft
markers, and provenance. It is a lint/readiness guardrail, not a claim that the skill is
objectively good.
Evaluator outputs are stored separately from the skill row and can be inspected with
galdr evaluations. Today's built-in evaluator is readiness_lint; future evaluators
can add human review, LLM review, outcome evidence, or a learned model without mixing
those signals into one opaque "quality" number.
galdr outcome is the supervised-data lane. It records append-only local JSONL events
under ~/.galdr/outcomes/ and indexes them into the catalog:
galdr outcome usagerecords that a skill was used in a later recording, with task kind, outcome, retry count, manual intervention count, and notes.galdr outcome labelrecords explicit labels or reviews such asaccepted,rejected,needs_review,regression, or any project-specific label.galdr outcome listshows the captured usage and labels.
This keeps learned-model training out of the agent loop while collecting the labels a future offline classifier or ranker needs: skill content/hash, provenance, task context, observed outcome, retries, interventions, and reviewer labels.
Optional: autonomous distillation (local MLX)
galdr distill --auto (optionally with a recording reference) lets a local model write
the finished skill instead of the agent. It is off by default and loopback-only —
it never reaches off the machine.
- Build with the feature:
cargo install --path . --features mlx. - Run a local OpenAI-compatible server, e.g.
mlx_lm.serverfrom mlx-lm:python3 -m mlx_lm.server --model mlx-community/Qwen3-4B-Instruct-2507-4bit(the default endpointhttp://127.0.0.1:8080and model are configurable in~/.galdr/config.json). galdr distill --auto(or--engine mlx-subprocessto shell out topython3 -m mlx_lm.generate, which needs nomlxfeature).
If the engine is missing or unreachable, --auto falls back to the deterministic
complete skill and still exits 0. Always review a machine-generated skill before relying
on it.
Wiring the sensor
For the sensor to receive events, the harness must invoke galdr hook in its
after-each-tool hook, passing the event on stdin. In Claude Code, add a PostToolUse
entry to ~/.claude/settings.json — hook arrays are concatenated, so it coexists with
any other hooks you have:
The sensor reads the harness fields from stdin (tool_name, tool_input,
tool_response, cwd, session_id, transcript_path) and is a no-op when no
recording is active.
If you prefer a resilient command that finds galdr on PATH and falls back to the
cargo bin, that form works too and is recognized by galdr doctor / setup claude --check:
if ; then ; \
elif [; then ; fi
Multi-harness: one skill, every harness
galdr distills a skill once, into the open-standard skills root (~/.agents/skills),
then makes it discoverable in every harness installed on the machine. Each harness
loads skills from its own directory, so galdr links the canonical skill into each one:
| Harness | Skills directory | Sensor wiring |
|---|---|---|
| Claude Code | ~/.claude/skills |
~/.claude/settings.json PostToolUse hook |
| Codex | ~/.codex/skills |
~/.codex/hooks.json (same hook shape) |
| Cursor | ~/.cursor/skills-cursor |
— |
The link is a symlink back to the canonical copy (the same mechanism a hand-linked
skill already uses), created on install, never clobbering a real file of the same
name. galdr harnesses shows what's installed; galdr link (re)links every skill;
galdr doctor flags any galdr skill a harness can't see. The result: record a task in
one harness, get a reusable skill in all of them.
On-disk layout
~/.galdr/
├── active active recording (JSON); absent = not recording
├── config.json optional config (distill engine, endpoint, model)
├── galdrd.sock daemon control socket (NDJSON over a Unix socket)
├── galdrd.pid daemon pidfile
├── catalog.sqlite queryable index, rebuilt from spans/ + recordings/ + skills
├── spans/<rec_id>.jsonl append-only span, one JSON line per tool call
├── outcomes/skill_usage.jsonl append-only skill usage observations
├── outcomes/skill_outcomes.jsonl append-only skill labels and reviews
└── recordings/<rec_id>.json metadata for each closed recording
The span is the raw source of truth: append-only, immutable, inspectable. The SQLite
catalog is an index, never the truth — it stores one-line step summaries (no raw
blobs), readiness/evaluation rows, and outcome indexes. galdr reindex rebuilds it from
spans, recordings, skills, and outcome logs at any time.
The root is ~/.galdr by default. Set GALDR_ROOT to relocate it (and
GALDR_SKILLS_ROOT to relocate the skills directory) — useful for hermetic tests,
throwaway profiles, CI, or to keep the daemon's Unix socket path under the platform's
length limit.
config.json may also include optional capture policy for future recordings:
The policy only applies to new events as the hook records them. It never edits spans that already exist.
strip_screenshots (on by default) drops base64 image blobs — a Computer Use
screenshot, an image content block — from the recorded event, keeping the action
(click, type, key) but not the pixels. The pixels are large and may show sensitive
on-screen content, and they are never the reusable signal; the action is. This is what
lets galdr record a GUI task the agent drives via Computer Use and distill it into a
clean, semantic skill — the agent's GUI work is just more tool calls.
Extension points
The core exposes two generic seams (src/ext.rs) and nothing more:
PermissionGate— decides whether an event may be recorded. Allows everything by default.ProvenanceSink— observes recorded events. Does nothing by default.
Any concrete integration (permission policy, traceability) plugs in by implementing these traits from outside the repository.
Security & privacy
The span captures each tool's tool_input and tool_response, which may contain
sensitive data (file contents, commands, outputs). Keep that in mind before recording
and before sharing a recording or a distilled skill. See SECURITY.md.
Design guarantees:
- The raw lives only in
~/.galdr, on your machine. galdr makes no external network egress. The one optional exception — autonomous distillation (--auto, featuremlx) — talks only to loopback, enforced in code byengine::validate_loopback; a non-loopback endpoint is a hard error. - The sensor never propagates errors to the agent session, and never depends on the daemon: it appends to the span first, then hints the daemon best-effort.
- The autonomous distiller treats the recorded span as untrusted data (delimiter, low temperature, output validation, human review). Default distillation is still done by your own agent, locally.
Roadmap
Phase 1 (shipped, built on the Phase 0 loop):
- ✅ Supervisor daemon + Unix socket + SQLite as a rebuildable catalog.
- ✅ TUI to browse recordings, inspect spans, and audit skill provenance.
- ✅ Diff-based parametrization of two recordings (constant = steps, variable = inputs).
- ✅ Autonomous distillation against a local, loopback-only MLX server.
- ✅ Skill catalog readiness signals with lifecycle status, provenance, score, delta, and an evaluator table for future human/LLM/model reviews.
- ✅ Operational diagnostics (
doctor,rec status, daemon status/stop, Claude setup check/print). - ✅ Safe export path that omits raw payloads by default and can emit redacted raw copies without touching the original span.
Phase 2 (shipped):
- ✅ Faithful render —
galdr distillrenders a valid skill from the span in the open-standardWhen to use/Inputs/Steps/Verificationanatomy.--fastinstalls it as-is;--autolets a local MLX model author it. - ✅ Multi-harness discoverability — a distilled skill is linked into every installed
harness's skills directory (Claude Code, Codex, Cursor);
galdr link/doctormanage it. - ✅ Multi-harness sensor —
galdr setup codexwires the same hook into Codex'shooks.json;galdr harnessesshows which harnesses are wired. - ✅ Session-scoped recording — a recording binds to the session that started it, so a concurrent session in another project can't leak its tool calls into the span.
- ✅ AI-first CLI —
--jsonon every read command.
Phase 3 (shipped) — a surface humans and agents both want to use:
- ✅ Author by default — a replay of the tool calls is not a skill.
galdr distillnow renders a faithful draft and hands the agent an authoring brief (supply the why, the generalized inputs, the gotchas), installed with--from; an unauthored draft scores lower until it is raised.--fastkeeps the mechanical one-shot for a floor. - ✅ No ids to copy — every command resolves a recording by the most recent, a name,
or a short id prefix.
galdr distilldistills the run you just made. - ✅ A friendly home screen —
galdrwith no arguments shows where you are and the next command to type; output is colorized, TTY-aware, and honorsNO_COLOR. - ✅
galdr suggest— signs every recording by the shape of its steps, dedupes against installed skills, and ranks the repeated tasks worth distilling. - ✅
galdr bench— aggregates recorded outcomes into a per-skill replay hit-rate and effort cost; the production signal a capability test cannot give. - ✅ A lazygit-style TUI — three panels, a live preview, and in-place actions (distill, link, validate, export, record outcome), all local and additive.
Next:
- Capture of human GUI gestures — the deliberate scope gap above. The agent's own browser automation is already captured (it arrives as tool calls). What's missing is recording a human driving a browser by hand, which needs a separate pixel/DOM capture layer on top of the span model — the one axis where Codex's pixel recorder does something galdr does not, and a job for the extension seam rather than the core.
- Verify the Codex sensor end to end with a live Codex recording (the wiring is in place; the stdin payload is not yet confirmed field-for-field).
- A multi-agent broker over the same span model.
- Real gates and real provenance plugged into the extension layer (
PermissionGate,ProvenanceSink) — where harness-specific policy or memory integrations live, kept out of the local-first core.
Contributing
See CONTRIBUTING.md. Commits follow Conventional Commits; the project uses Semantic Versioning.
License
Licensed under either of Apache License, Version 2.0 or the MIT license at your option — the standard Rust dual-license, so you pick whichever fits.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.