1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
//! Linux namespace isolation backend for [`super::SandboxTier`].
//!
//! Translates the abstract tier into [`unshare(2)`] flags for unprivileged
//! user namespaces.
//!
//! # Tier mapping
//!
//! - [`SandboxTier::None`] — no `unshare`; the always-on parent hardening still applies.
//! - [`SandboxTier::Hardened`] — `CLONE_NEWPID | CLONE_NEWIPC` inside `CLONE_NEWUSER`.
//! - [`SandboxTier::Lockdown`] — Hardened plus `CLONE_NEWNS | CLONE_NEWNET`.
//!
//! # Mount namespace and the helper
//!
//! `unshare(CLONE_NEWNS)` creates an isolated mount namespace, but populating
//! it with private mounts (e.g. tmpfs over `/tmp`) requires `mount(2)`, which
//! is not async-signal-safe and therefore unsafe to call from the `pre_exec`
//! closure. So `apply_sandbox` only creates the namespace here; the
//! private-mount setup is performed by [`crate::execution::sandbox_helper`],
//! which the supervisor `execve`s into for [`SandboxTier::Lockdown`] runs.
//! The helper runs inside the new namespace (out of `pre_exec` context), does
//! the mount work, then `execve`s the target through its inherited fd so
//! TOCTOU pinning is preserved across the helper indirection.
//!
//! # Requirements
//!
//! - Linux 3.8+ for unprivileged user namespaces.
//! - `/proc/sys/kernel/unprivileged_userns_clone = 1` (default on most distros).
//! - For full network isolation: kernel 5.9+ recommended.
use SandboxTier;
/// Apply sandbox isolation matching the requested tier.
///
/// Called between `fork()` and `exec()` via `Command::pre_exec`. Only
/// async-signal-safe syscalls are used (just `unshare(2)`).
///
/// # Tier coverage
///
/// - [`SandboxTier::None`] — no-op.
/// - [`SandboxTier::Hardened`] — PID + IPC namespaces (and the wrapping user namespace).
/// - [`SandboxTier::Lockdown`] — PID + IPC + mount + network namespaces (plus user
/// namespace). The mount namespace is created here but populated with private
/// mounts by [`crate::execution::sandbox_helper`] post-`pre_exec`.
///
/// # Errors
///
/// [`std::io::ErrorKind::PermissionDenied`] if `unshare` fails (e.g.
/// unprivileged user namespaces disabled by sysctl).
///
/// # Safety
///
/// Must only be called from a `pre_exec` closure.
/// Check whether unprivileged user namespaces are available.
///
/// Returns `true` if the kernel supports creating user namespaces
/// without root. Prerequisite for any non-`None` tier on Linux.