Module safety

Expand description

Safety: given the effects a program performs, how much of its blast radius is gated (requires approval, or denied) versus allowed under an agent policy?

For an agent operating with real capabilities, the safety question is not “is this code correct” but “what is the worst this can do, and is the dangerous part gated?” This module classifies a program by the Effects it performs, applies a default-deny-for-dangerous agent [Policy], and scores how much of the dangerous surface is held behind approval/denial. A program whose only dangerous effects are approval-gated scores high; one that runs privileged or executes arbitrary commands unconditionally scores low.

Structs§

ExfiltrationReport: Whether a program has a data-exfiltration path: it both reads local/sensitive state (a source) and can send data out (a sink — network or arbitrary exec). The dangerous combination is source ∧ sink; either alone is not an exfil path.
ReversibilityReport: How much of a program’s dangerous blast radius is reversible — backed by an undo/rollback (transaction, trash, snapshot) rather than permanent. Gating (see assess_safety) bounds whether a dangerous effect runs; reversibility bounds the damage if it does. Together they describe the real recoverable blast radius.
SafetyReport: The safety assessment of a program described by the effects it performs.

Enums§

Decision: The policy decision for an effect under a mode.
Effect: The effect class of an operation — the single property safety reasons about. Ordered from harmless to most dangerous.
Mode: Who is operating: a human at a REPL, or an autonomous agent.

Functions§

assess_exfiltration: Assess data-exfiltration exposure from the effects a program performs — a read source (Effect::ReadLocal) combined with an egress sink (Effect::Network or Effect::Exec).
assess_reversibility: Assess reversibility from (effect, reversible) pairs — each operation’s effect class plus whether it has an undo/rollback. Only dangerous effects count toward the score (a pure read is trivially safe regardless of “reversibility”).
assess_safety: Assess a program’s safety from the effects it performs, under mode.
assess_safety_named: Assess safety from operation names plus a classify closure mapping each name to its Effect (e.g. a host’s effect classifier). Names the classifier returns None for are skipped. Convenience over assess_safety when you start from names rather than effects.
decide: The default agent policy: humans get default-allow (great errors instead of friction); agents get default-deny for the dangerous classes. This mirrors the AetherShell agentic-first model so the score reflects a real, shipped policy.

Module safety

Module safety Copy item path

Structs§

Enums§

Functions§

Module safety