Expand description
Safety: given the effects a program performs, how much of its blast radius is gated (requires approval, or denied) versus allowed under an agent policy?
For an agent operating with real capabilities, the safety question is not “is
this code correct” but “what is the worst this can do, and is the dangerous
part gated?” This module classifies a program by the Effects it performs,
applies a default-deny-for-dangerous agent [Policy], and scores how much of
the dangerous surface is held behind approval/denial. A program whose only
dangerous effects are approval-gated scores high; one that runs privileged or
executes arbitrary commands unconditionally scores low.
Structs§
- Exfiltration
Report - Whether a program has a data-exfiltration path: it both reads local/sensitive state (a source) and can send data out (a sink — network or arbitrary exec). The dangerous combination is source ∧ sink; either alone is not an exfil path.
- Reversibility
Report - How much of a program’s dangerous blast radius is reversible — backed by an
undo/rollback (transaction, trash, snapshot) rather than permanent. Gating (see
assess_safety) bounds whether a dangerous effect runs; reversibility bounds the damage if it does. Together they describe the real recoverable blast radius. - Safety
Report - The safety assessment of a program described by the effects it performs.
Enums§
- Decision
- The policy decision for an effect under a mode.
- Effect
- The effect class of an operation — the single property safety reasons about. Ordered from harmless to most dangerous.
- Mode
- Who is operating: a human at a REPL, or an autonomous agent.
Functions§
- assess_
exfiltration - Assess data-exfiltration exposure from the effects a program performs — a read
source (
Effect::ReadLocal) combined with an egress sink (Effect::NetworkorEffect::Exec). - assess_
reversibility - Assess reversibility from
(effect, reversible)pairs — each operation’s effect class plus whether it has an undo/rollback. Only dangerous effects count toward the score (a pure read is trivially safe regardless of “reversibility”). - assess_
safety - Assess a program’s safety from the effects it performs, under
mode. - assess_
safety_ named - Assess safety from operation names plus a
classifyclosure mapping each name to itsEffect(e.g. a host’s effect classifier). Names the classifier returnsNonefor are skipped. Convenience overassess_safetywhen you start from names rather than effects. - decide
- The default agent policy: humans get default-allow (great errors instead of friction); agents get default-deny for the dangerous classes. This mirrors the AetherShell agentic-first model so the score reflects a real, shipped policy.