Skip to main content

Module safety

Module safety 

Source
Expand description

Safety: given the effects a program performs, how much of its blast radius is gated (requires approval, or denied) versus allowed under an agent policy?

For an agent operating with real capabilities, the safety question is not “is this code correct” but “what is the worst this can do, and is the dangerous part gated?” This module classifies a program by the Effects it performs, applies a default-deny-for-dangerous agent [Policy], and scores how much of the dangerous surface is held behind approval/denial. A program whose only dangerous effects are approval-gated scores high; one that runs privileged or executes arbitrary commands unconditionally scores low.

Structs§

ExfiltrationReport
Whether a program has a data-exfiltration path: it both reads local/sensitive state (a source) and can send data out (a sink — network or arbitrary exec). The dangerous combination is source ∧ sink; either alone is not an exfil path.
ReversibilityReport
How much of a program’s dangerous blast radius is reversible — backed by an undo/rollback (transaction, trash, snapshot) rather than permanent. Gating (see assess_safety) bounds whether a dangerous effect runs; reversibility bounds the damage if it does. Together they describe the real recoverable blast radius.
SafetyReport
The safety assessment of a program described by the effects it performs.

Enums§

Decision
The policy decision for an effect under a mode.
Effect
The effect class of an operation — the single property safety reasons about. Ordered from harmless to most dangerous.
Mode
Who is operating: a human at a REPL, or an autonomous agent.

Functions§

assess_exfiltration
Assess data-exfiltration exposure from the effects a program performs — a read source (Effect::ReadLocal) combined with an egress sink (Effect::Network or Effect::Exec).
assess_reversibility
Assess reversibility from (effect, reversible) pairs — each operation’s effect class plus whether it has an undo/rollback. Only dangerous effects count toward the score (a pure read is trivially safe regardless of “reversibility”).
assess_safety
Assess a program’s safety from the effects it performs, under mode.
assess_safety_named
Assess safety from operation names plus a classify closure mapping each name to its Effect (e.g. a host’s effect classifier). Names the classifier returns None for are skipped. Convenience over assess_safety when you start from names rather than effects.
decide
The default agent policy: humans get default-allow (great errors instead of friction); agents get default-deny for the dangerous classes. This mirrors the AetherShell agentic-first model so the score reflects a real, shipped policy.