koda-sandbox
Capability-aware sandbox layer for Koda. Kernel-enforced FS / network / exec policies, derived from trust mode at compile time.
This crate owns kernel-level sandbox enforcement (Seatbelt on macOS, bwrap on Linux), workspace provisioning, and the egress proxy. It implements the design in #934.
Table of contents
- Threat model
- Trust modes
- Architecture at a glance
- Per-platform enforcement
- Defense in depth
- Escape hatches
- Configuration philosophy
- What's enforced today
- Security boundary tests
Threat model
What we defend against (in scope)
| Threat | Layer | Notes |
|---|---|---|
LLM-issued path traversal (../../etc/passwd, symlink escape) |
App-layer path_defense + kernel |
Symlink chains expanded before policy check |
LLM-issued credential exfiltration via cat ~/.ssh/id_rsa |
Kernel deny-read on koda/db only; egress proxy filters traffic |
Reading other creds is intentionally allowed (CLI tools need them) — see accepted risks |
LLM-issued unauthorized writes (modify ~/.ssh/authorized_keys, drop a .bashrc payload) |
Kernel deny-write on credential dirs + project carve-outs | koda/db is fully read+write denied |
| LLM-issued egress to attacker domains | App-layer egress proxy (HTTP/HTTPS/SOCKS5) | TLS termination optional via corporate CA |
LLM-issued sandbox escape via git config core.fsmonitor |
fs.allow_git_config = false by default |
Prevents registering a malicious hook → RCE |
Runaway / DoS commands (yes | head -c 1G, fork bombs) |
Wall-time ceiling + output cap in shell tool | Per-trust resource limits in policy |
| Sub-agent privilege escalation | SandboxPolicy::compose — child can only narrow parent |
Anti-escalation tests pinned per field |
| Worker process compromise | Crash-isolated koda-fs-worker per slot |
Compromised worker can't take down the koda host |
What we explicitly do not defend against (out of scope)
These are accepted risks with documented rationale — not bugs. Changes need an issue + design discussion before tightening, because each one exists to keep legitimate workflows working.
| Out-of-scope | Why we don't enforce | Mitigated by |
|---|---|---|
Reading credential files (~/.ssh/id_rsa, ~/.aws/credentials) |
CLI tools (gh, aws, kubectl, ssh) authenticate by reading their own config. Blocking reads breaks normal use. |
Egress proxy (Phase 3) prevents exfiltration over the network. Blocking reads without blocking egress is security theater (see #855). |
Local filesystem readout of arbitrary ~/Documents |
The agent operates on the user's project; tight read scoping would break grep across symlinked dotfiles, monorepos, pnpm stores. | Trust mode + user gate (Plan / Safe). Reads stay local until the egress proxy decides. |
| CPU spikes | Wall-time + output caps catch most foot-guns; CPU rlimits add platform-specific gnarliness. | Wall-time ceiling (MAX_TIMEOUT_SECS = 300s) bounds blast radius. PR-6 will add CPU rlimits. |
| Side-channel attacks (timing, cache) | Out of scope for an LLM agent sandbox. Threat model is anti-malice, not anti-research. | n/a |
| Privileged-user attacks | The user runs koda — they could just rm -rf themselves. The sandbox prevents the LLM from doing it, not the user. |
Trust model: user is trusted, LLM is not. |
| Kernel bugs in Seatbelt / bwrap | Trusting the kernel's own sandbox primitives is unavoidable. | Defense in depth via app-layer checks; report platform vulns upstream. |
Trust assumptions
- The user is trusted. They invoked koda intentionally. Sandbox protects against LLM-issued actions, not user-issued ones.
- The kernel is trusted. We rely on Seatbelt / bwrap correctness for FS isolation.
- The LLM is not trusted. Treat every tool call as potentially adversarial — the agent might be jailbroken, prompt-injected, or just hallucinating dangerous commands.
- The koda host process is trusted. It runs outside the sandbox and orchestrates worker processes that run inside.
- Worker processes are crash-isolation boundaries. A compromised worker dies; the koda host respawns and continues.
Trust modes
The user's chosen trust mode is the only runtime input that affects sandbox
behavior. Every dial in SandboxPolicy is derived from it at compile time via
koda_core::sandbox::policy_for_agent. See Configuration philosophy.
| Mode | Approval | Wall time | Deny-search depth | Use case |
|---|---|---|---|---|
| Plan | All mutations rejected | 60s | 3 (perf-tuned) | Read-only exploration, code review |
| Safe | Each mutation user-approved | 60s | 5 (balanced) | Default for general dev work |
| Auto | Auto-approved | 60s | 10 (max paranoia) | CI / unattended runs — kernel sandbox required |
Security argument for the depth scaling: more permissive trust mode → less human gating → more paranoid sandbox checking. Auto mode runs without a user in the loop, so the sandbox compensates with the deepest deny-rule traversal.
This invariant is mechanically pinned by the test
policy_for_agent_depth_is_strictly_monotone_with_permissiveness in
koda-core/src/sandbox.rs.
Architecture at a glance
┌──────────────────────────────────────────────────┐
│ koda host process (TRUSTED, runs outside sbox) │
│ │
│ • LLM provider calls │
│ • Tool dispatch │
│ • SandboxPolicy construction (policy_for_agent) │
│ • SandboxPool slot allocation │
└────┬───────────────────────┬─────────────────────┘
│ spawns │ spawns
│ (Seatbelt/bwrap) │ (Unix socket IPC)
▼ ▼
┌──────────────────────┐ ┌─────────────────────────────────┐
│ User command │ │ koda-fs-worker (crash-isolated) │
│ (LLM-controlled) │ │ │
│ │ │ • Validates Write/Edit ops │
│ Kernel-enforced: │ │ against SandboxPolicy │
│ • FS deny rules │ │ • Receives policy via env var │
│ • Network egress │ │ KODA_FS_WORKER_POLICY (IPC) │
│ • Process limits │ │ • One worker per sandbox slot │
└──────────────────────┘ └─────────────────────────────────┘
│
▼ outbound traffic
┌──────────────────────────────────────┐
│ Built-in egress proxy (Phase 3) │
│ • Domain allow/deny lists │
│ • Optional MITM via corporate CA │
│ • SOCKS5 fallback for non-HTTP apps │
└──────────────────────────────────────┘
Dependency direction
koda-cli → koda-core → koda-sandbox
│
└─ no upward dependency
koda-sandbox knows nothing about Persistence, Provider, or
ToolRegistry. Pure infrastructure — testable with
assert_eq!(transform(cmd, policy), expected).
Per-platform enforcement
| Platform | Mechanism | Module | Enforcement scope |
|---|---|---|---|
| macOS | sandbox-exec(1) (Seatbelt) |
seatbelt.rs |
FS read/write, network egress, process limits |
| Linux | bwrap(1) (bubblewrap) |
bwrap.rs + stage2.rs |
FS via mount namespaces, network via netns + bridge |
| Other Unix | None | n/a | Refuses to enter Auto mode (failIfUnavailable) |
| Windows | None today | n/a | Out of scope for current threat model |
macOS Seatbelt details
- Profile generated dynamically per-policy — see
seatbelt::build_profile_string. policy_overlay_rulesappends user policy after the hardcoded baseline so the baseline (credential paths, koda DB) acts as a floor.- Optional MITM proxy chaining via
build_proxied_profile_stringfor corporate Zscaler / PKI environments. weaker_macos_isolationflag (defaultfalse) opts into Appletrustdcallbacks for Go-binary TLS verification — costs network sandboxing strength.
Linux bwrap details
- Stage-2 helper (
koda-sandbox-stage2) sets up the in-netns TCP↔UDS bridge inside the unshare'd namespace, thenexecvps the user's command. - Mount namespace gives true read-only
/with carved-out writable roots. - Network namespace + virtual bridge gives kernel-enforced egress: even
binaries that ignore
HTTPS_PROXYenv vars cannot escape via direct TCP.
Auto mode is fail-closed
When the user picks Auto, koda refuses to start if the kernel sandbox
isn't available on the platform (Codex's failIfUnavailable pattern).
Rationale: Auto means "no human in the loop" — running without kernel
enforcement under those conditions would be reckless.
Defense in depth
The sandbox is not a single fence — it's four concentric ones, each correcting for failure modes the others can't see.
| Layer | Catches | Module |
|---|---|---|
| 1. Trust mode + approval gate | Operator-policy violations before they reach the sandbox | koda-core::trust |
| 2. Application-layer path defense | Symlink escapes, .. chains, dangerous-system-path heuristics |
path_defense.rs |
| 3. SandboxPolicy compose chain | Sub-agent privilege escalation (child can only narrow parent) | policy.rs::compose |
| 4. Kernel sandbox | Direct syscalls that bypass app-layer checks | seatbelt.rs / bwrap.rs |
A bypass needs to defeat all four layers, not any one. This is by design — each layer's threat model is documented in its module-level rustdoc.
Escape hatches
These are intentional bypass mechanisms. They exist for specific documented use cases. Each one has a security cost; using one is a deliberate acknowledgement that the user accepts that cost.
--no-sandbox flag
Disables kernel sandbox entirely. Used for:
- Debugging the sandbox itself when developing
koda-sandbox. - Platform-unsupported environments (CI containers without
bwrap). - Reproducing customer-reported bugs where the user can't run with sandbox.
Cost: No FS / network / exec enforcement. Only the trust-mode approval gate stands between the LLM and your filesystem. Plan mode still rejects mutations, but Safe and Auto allow whatever the LLM asks for.
fs.allow_git_config = true
Permits writes to git config. Off by default because git config core.fsmonitor
can register a hook that runs arbitrary code on the next git status → RCE.
Cost: Trades sandbox containment for
git configergonomics. Only opt in for workflows that genuinely need to modify git config (rare).
net.weaker_macos_isolation = true
macOS-only. Permits Apple trustd mach lookups so Go binaries can verify TLS.
Cost: Widens the network sandbox surface. Opt in if Go binaries don't work otherwise.
MITM CA bundle (mitm.ca_bundle)
Configures the egress proxy to terminate TLS using a corporate CA (Zscaler, internal PKI). Used for environments where the corporate proxy is mandatory and inspects HTTPS.
Cost: All HTTPS traffic from sandboxed processes is decrypted and re-encrypted by koda. The proxy still applies domain allow/deny rules — MITM is enforcement, not an escape — but it materially expands what koda's process can see.
failIfUnavailable = false (implicit, for Plan/Safe modes)
Plan and Safe modes start without a kernel sandbox if the platform doesn't support one. Auto mode requires kernel enforcement.
Cost: Plan/Safe rely entirely on app-layer path defense + the approval gate when no kernel sandbox is available. Trust mode is the safety net.
Configuration philosophy
Koda is config-free at runtime.
Every behavioral dial in SandboxPolicy is derived at compile time from the
trust mode (Plan / Safe / Auto) via koda_core::sandbox::policy_for_agent.
There is no JSON config file, no CLI override, no env-var dial that the user
can twiddle.
Why. Config knobs are foot-guns:
- They turn into compatibility liabilities the moment they ship.
- They force users to make security decisions they're not equipped to make.
- They drift from documented defaults, breaking the threat model silently.
If a behavioral change is justified, it goes into policy_for_agent (a
compile-time table keyed off trust mode), not into a config file.
One legitimate exception: IPC
SandboxPolicy keeps its serde::Deserialize impl because the
crash-isolated koda-fs-worker process needs to receive the policy across
the process boundary (via KODA_FS_WORKER_POLICY env var as JSON). This is
IPC, not config — the host built the policy in-memory via
policy_for_agent; the JSON is just the wire format.
To prevent this IPC channel from morphing into a config loader by accretion,
the test
sandbox_policy_deserialize_only_used_in_fs_worker_binary walks the
workspace and fails the build if any file other than the worker binary
deserializes a SandboxPolicy. A future contributor adding a config loader
gets a red CI build with a teaching moment baked in.
What's enforced today
Honest accounting of which fields actually do something vs. are declared-but-not-yet-enforced. This list is meant to drift downward over time as enforcement code lands.
| Field | Declared | Enforced |
|---|---|---|
fs.deny_read |
✅ | ✅ kernel (Seatbelt + bwrap) + worker |
fs.allow_read_within_deny |
✅ | ✅ kernel + worker |
fs.allow_write |
✅ | ✅ kernel + worker |
fs.deny_write_within_allow |
✅ | ✅ kernel + worker |
fs.allow_git_config |
✅ | ✅ worker |
fs.mandatory_deny_search_depth |
✅ | ✅ worker — clamped via effective_chain_depth floor=8 / ceiling=40 |
net.allowed_domains / net.denied_domains |
✅ | ✅ egress proxy |
net.allow_local_binding |
✅ | ✅ kernel |
net.mitm |
✅ | ✅ egress proxy |
net.weaker_macos_isolation |
✅ | ✅ Seatbelt profile |
limits.wall_time_secs |
✅ | ✅ shell tool dispatch |
limits.cpu_time_secs |
✅ | ✅ setrlimit RLIMIT_CPU — SIGXCPU at cap |
limits.max_rss_bytes |
✅ | ✅ setrlimit RLIMIT_AS — ENOMEM at cap (Linux RLIMIT_RSS is a no-op; AS caps virtual memory) |
limits.max_open_fds |
✅ | ✅ setrlimit RLIMIT_NOFILE — EMFILE at cap |
limits.max_output_bytes |
✅ | ✅ shell tool dispatch — truncates messages.full_content at policy-supplied byte cap (default 2 MB when None) |
trust (TrustPreference) |
✅ | ✅ approval gate (koda-core::trust) |
Declared-only fields are intentionally inert; setting them today has no
runtime effect. They're included in SandboxPolicy so the type contract
stays stable while enforcement lands incrementally.
Security boundary tests
Each load-bearing claim in this README is pinned by a test. If you change the corresponding behavior, expect to update the test by name — that's the point.
| Claim | Test |
|---|---|
| Auto mode refuses to run without kernel sandbox | koda-core::sandbox::tests::build_errors_in_auto_mode_when_sandbox_unavailable |
| Sub-agent can never widen parent's policy | koda-sandbox::policy::tests::compose_drops_child_allow_writes_so_child_cannot_widen |
| Trust mode is strictly monotonic with paranoia | koda-core::sandbox::tests::policy_for_agent_depth_is_strictly_monotone_with_permissiveness |
| Wall-time ceiling clamps even policy-supplied values | koda-core::tools::shell::tests::timeout_max_ceiling_clamps_policy_too |
| Symlink escape is denied at the worker | koda-sandbox::worker::tests::symlink_escape_is_denied |
koda/db is fully read-write denied |
koda-sandbox::seatbelt::tests::strict_profile_fully_denies_koda_db |
Only the worker binary deserializes SandboxPolicy |
koda-sandbox::policy::tests::sandbox_policy_deserialize_only_used_in_fs_worker_binary |
| Egress proxy clamps Auto mode network | koda-sandbox::seatbelt::tests::proxied_profile_omits_open_network_allow |
Adding a new defended threat? Add a test next to the relevant module and update this table. Removing a defended threat? Update the threat model table first, then remove the test.
Phase status
This crate is the kernel-enforcement implementation of issue #934. Phase 5 (current) wraps sub-agent policy composition + the per-trust derivation. PR-6 (next) wires declared-only resource limits to actual rlimit / SBPL enforcement.
For the full design rationale, threat-model evolution, and per-phase acceptance criteria, see the issue.