Sandlock
Lightweight process sandbox for Linux. Confines untrusted code using Landlock (filesystem + network + IPC), seccomp-bpf (syscall filtering), and seccomp user notification (resource limits, IP enforcement, /proc virtualization). No root, no cgroups, no containers.
sandlock run -w /tmp -r /usr -r /lib -m 512M -- python3 untrusted.py
Why Sandlock?
Containers and VMs are powerful but heavy. Sandlock targets the gap: strict confinement without image builds or root privileges. Built-in COW filesystem protects your working directory automatically.
| Feature | Sandlock | Container | MicroVM (Firecracker) |
|---|---|---|---|
| Root required | No | Yes* | Yes (KVM) |
| Image build | No | Yes | Yes |
| Startup time | ~5 ms | ~200 ms | ~100 ms |
| Kernel | Shared | Shared | Separate guest |
| Filesystem isolation | Landlock + seccomp COW | Overlay | Block-level |
| Network isolation | Landlock + seccomp notif | Network namespace | TAP device |
| HTTP-level ACL | Method + host + path rules | N/A | N/A |
| Syscall filtering | seccomp-bpf | seccomp | N/A |
| Resource limits | seccomp notif + SIGSTOP | cgroup v2 | VM config |
* Rootless containers exist but require user namespace support and /etc/subuid configuration.
Architecture
Sandlock is implemented in Rust for performance and safety:
- sandlock-core — Rust library: Landlock, seccomp, supervisor, COW, pipeline
- sandlock-cli — Rust CLI binary (
sandlock run ...) - sandlock-ffi — C ABI shared library (
libsandlock_ffi.so) - Python SDK — ctypes bindings to the FFI library
┌─────────────┐
│ Python SDK │ ctypes FFI
│ (sandlock) │──────────────┐
└─────────────┘ │
▼
┌──────────────┐ ┌──────────────────────────────┐
│ sandlock CLI │───>│ libsandlock_ffi.so │
└──────────────┘ └──────────────┬───────────────┘
│
┌──────────────▼───────────────┐
│ sandlock-core │
│ Landlock · seccomp · COW · │
│ pipeline · policy_fn · vDSO │
└──────────────────────────────┘
Requirements
- Linux 6.12+ (Landlock ABI v6), Rust 1.70+ (to build)
- Python 3.8+ (optional, for Python SDK)
- No root, no cgroups
| Feature | Minimum kernel |
|---|---|
| seccomp user notification | 5.6 |
| Landlock filesystem rules | 5.13 |
| Landlock TCP port rules | 6.7 (ABI v4) |
| Landlock IPC scoping | 6.12 (ABI v6) |
Protections can be selectively waived per-policy when needed — see
docs/sandbox-reference.md#protection-opt-out.
Install
From source
# Build the Rust binary and shared library
# Install Python SDK (auto-builds Rust FFI library)
&&
CLI only
Quick Start
CLI
# Basic confinement
# Interactive shell
# Resource limits + timeout
# Outbound allowlist — restrict to one host on one port
# Multiple ports for one host, plus a separate any-IP port
# Wildcard port (optional): a bare `host` (or `host:*`) permits every port
# IP, CIDR range, or IPv6 literal as the target (matched by containment,
# no DNS); same grammar as --net-deny
# Unrestricted outbound: `*` opens any host and any TCP port (`:*` / `*:*`
# are equivalent). For full egress add a UDP wildcard, `udp://*`.
# UDP — scheme prefix gates the protocol and scopes the destination
# (e.g. DNS to 1.1.1.1, plus TCP HTTPS to anywhere)
# Ping — kernel ping socket (SOCK_DGRAM) gated by net.ipv4.ping_group_range
# Denylist: default-allow networking, block specific IPs/CIDRs/ports
# (inverse of --net-allow; mutually exclusive with it). Port is optional.
# HTTP-level ACL (method + host + path rules via transparent proxy)
# HTTP rules with concrete hosts auto-extend --net-allow with host:80,443
# HTTPS MITM, zero-config: sandlock generates an ephemeral CA and splices it
# into the trust bundle(s) you name. No openssl, no manual install.
# Node and other runtimes with a compiled-in CA list: export the cert and
# wire the runtime's own env var.
# HTTPS MITM with your own CA (still supported)
# Server listening on ports (Landlock --net-allow-bind, separate from --net-allow;
# accepts comma-separated ports and lo-hi ranges, repeatable)
# Clean environment
# Deterministic execution (frozen time + seeded randomness)
# Port virtualization (multiple sandboxes can bind the same port)
# Port virtualization with named sandboxes (enables network discovery)
# List all running sandboxes
# Kill a running sandbox by name
# Chroot with per-sandbox mount (no kernel bind mount needed)
# COW filesystem (writes captured, committed on success)
# Dry-run (show what files would change, then discard)
# Use a saved profile
# No-supervisor mode (Landlock + deny-only seccomp, no supervisor process)
# Nested sandboxing: confine sandlock's own supervisor
Python API
=
# Run a command (with optional timeout in seconds)
=
assert
assert b in
# HTTP ACL: only allow specific API calls
=
=
# Chroot with per-sandbox mount (Docker-style -v, no root needed)
=
=
# Port virtualization: query port mappings while sandbox is running
=
# sb.ports() returns {virtual_port: real_port} while running
# Confine the current process (Landlock filesystem only, irreversible)
# Dry-run: see what files would change, then discard
=
=
# A=added, M=modified, D=deleted
Pipeline
Chain sandboxed stages with the | operator — each stage has its own
independent sandbox config. Data flows through kernel pipes.
=
=
# Reader can access data, processor cannot
=
assert b in
XOA pattern (eXecute Only Agents): planner generates code, executor runs it with data access but no network:
=
=
=
Dynamic Policy (policy_fn)
Inspect syscall events at runtime and adjust permissions on the fly.
Events carry syscall name, category, PID, network destination (for
connect/sendto/bind), and argv (for execve). The callback
returns a verdict to allow, deny, or audit.
# Block download tools by argv
return True # deny
# Deny connections to a specific IP
return
# Lock down once the program has finished starting up
# block all network
# dynamic fs deny
# Audit every file access (allow but flag)
return
return 0 # allow
=
=
Verdicts: 0/False = allow, True/-1 = deny (EPERM),
positive int = deny with errno, "audit"/-2 = allow + flag.
Event fields: syscall, category (file/network/process/memory),
pid, parent_pid, host, port, argv, denied.
TOCTOU NOTE Per
seccomp_unotify(2), the kernel re-reads user-memory pointers afterContinue. Sandlock handles this in two places:
- Path strings are not exposed on events. Path-based access control belongs in static Landlock rules (
fs_readable/fs_writable/fs_denied) — kernel-enforced and TOCTOU-immune. Usectx.deny_path()for runtime additions.event.argvis exposed and TOCTOU-safe. Before exposingargvtopolicy_fnor returningContinuefor anexecve, the supervisor freezes every task inProcessIndex, including peer processes that may alias argv through shared memory. Withpolicy_fnactive, fork-like syscalls are traced for one ptrace creation event, so children are registered inProcessIndexbefore they can run user code. If the freeze or creation tracking cannot be established (e.g., YAMA blocks ptrace), the syscall is denied withEPERM; the safety invariant is never silently relaxed.
Context methods:
ctx.restrict_network(ips)/ctx.grant_network(ips)— network controlctx.restrict_max_memory(bytes)/ctx.restrict_max_processes(n)— resource limitsctx.deny_path(path)/ctx.allow_path(path)— dynamic filesystem restrictionctx.restrict_pid_network(pid, ips)— per-PID network override
Held syscalls (child blocked until callback returns): execve,
connect, sendto, bind, openat.
Rust API
use ;
use ByteSize;
use Verdict;
// Basic run
let mut sandbox = builder
.fs_read.fs_read
.fs_write
.max_memory
.name
.build?;
let result = sandbox.run.await?;
assert!;
// HTTP ACL: restrict API access at the HTTP level
let mut agent = builder
.fs_read.fs_read.fs_read
.http_allow
.http_deny
.name
.build?;
let result = agent.run.await?;
// Confine the current process (Landlock filesystem only, irreversible)
let confinement = builder
.fs_read.fs_read
.fs_write
.build;
confine?;
// Pipeline
let producer = builder
.fs_read.fs_read.fs_read
.build?;
let consumer = producer.clone;
let result = .run.await?;
// Dynamic policy
let mut dynamic = builder
.fs_read.fs_read
.policy_fn
.build?;
let result = dynamic.run.await?;
Profiles
Save reusable sandbox profiles as TOML files in
~/.config/sandlock/profiles/. Profiles use a sectioned schema; top-level
flat keys such as fs_readable = [...] are rejected. Pass a sandbox instance
name with --name when you need a stable virtual hostname.
# ~/.config/sandlock/profiles/build.toml
[]
= "make"
= ["-j4"]
= true
= { = "gcc", = "C.UTF-8" }
[]
= ["/usr", "/lib", "/lib64", "/bin", "/etc"]
= ["/tmp/work"]
[]
= "512M"
= 50
[]
= []
How It Works
Sandlock applies confinement in sequence after fork():
Parent Child
│ fork() │
│──────────────────────────────────>│
│ ├─ 1. setpgid(0,0)
│ ├─ 2. Optional: chdir(cwd)
│ ├─ 3. NO_NEW_PRIVS
│ ├─ 4. Landlock (fs + net + IPC)
│ ├─ 5. seccomp filter (deny + notif)
│ │ └─ send notif fd ──> Parent
│ receive notif fd ├─ 6. Wait for "ready" signal
│ start supervisor (tokio) ├─ 7. Close fds 3+
│ optional: vDSO patching └─ 8. exec(cmd)
│ optional: policy_fn thread
│ optional: CPU throttle task
Seccomp Supervisor
The async notification supervisor (tokio) handles intercepted syscalls:
| Syscall | Handler |
|---|---|
clone/fork/vfork |
Process count enforcement |
mmap/munmap/brk/mremap |
Memory limit tracking |
connect/sendto/sendmsg |
IP allowlist + on-behalf execution + HTTP ACL redirect |
bind |
On-behalf bind + port remapping |
openat |
/proc virtualization, COW interception |
unlinkat/mkdirat/renameat2 |
COW write interception |
execve/execveat |
policy_fn hold + vDSO re-patching |
getrandom |
Deterministic PRNG injection |
clock_nanosleep/timer_settime |
Timer adjustment for frozen time |
getdents64 |
PID filtering, COW directory merging |
getsockname |
Port remap translation |
Custom Handlers
Downstream Rust crates can append their own seccomp-notification
handlers to the supervisor chain alongside the builtins, registering
for any syscall they care about via the Handler trait and
Sandbox::run_with_handlers. The builtin chain runs first, so
user handlers cannot subvert confinement; the registration step also
rejects handlers on syscalls in the default blocklist or
extra_deny_syscalls. See
docs/extension-handlers.md for the
full API, ordering semantics, and state patterns.
COW Filesystem
Copy-on-write filesystem isolation via seccomp notification: when
workdir is set, sandlock intercepts filesystem syscalls and stages
writes in an upper directory; reads resolve upper-then-lower. No mount
namespace, no user namespace, no root. Committed on exit, aborted on
error.
Dry-run mode: --dry-run runs the command, inspects the COW layer
for changes (added/modified/deleted files), prints a summary, then
aborts — leaving the workdir completely untouched. Useful for previewing
what a command would do before committing.
COW Fork & Map-Reduce
Initialize expensive state once, then fork COW clones that share memory.
Each clone uses raw fork(2) with shared copy-on-write pages. 1000
clones in ~530ms, ~1,900 forks/sec.
Each clone's stdout is captured via its own pipe. reduce() reads all
pipes and feeds combined output to a reducer's stdin — fully pipe-based
data flow with no temp files.
global ,
= # 2 GB, loaded once
=
=
# stdout → per-clone pipe
# Map: fork 4 clones with a separate sandbox config
=
=
# Reduce: pipe clone outputs to reducer stdin
=
=
# b"total\n"
let mut mapper = builder
.fs_read.fs_read.fs_read.fs_read
.fs_read
.name
.init_fn
.work_fn
.build?;
let mut clones = mapper.fork.await?;
let reducer = builder
.fs_read.fs_read.fs_read.fs_read
.name
.build?;
let result = reducer.reduce.await?;
Map and reduce run in separate sandboxes with independent configs —
the mapper has data access, the reducer doesn't. Each clone inherits
Landlock + seccomp confinement. CLONE_ID=0..N-1 is set automatically.
Network Model
Outbound traffic is gated by an endpoint list naming
protocol × destination. --net-allow (allowlist) and --net-deny
(denylist) share one grammar and are mutually exclusive:
<spec> repeatable; the port is optional (a bare target = all ports)
target host | <ip> | <cidr> | * (`*` or empty target = any IP)
forms target[:port[,port,...]] · :port · host:* · :* · *:*
[<ipv6|cidr>]:port (bracket IPv6 when a port follows)
scheme tcp:// (default) · udp:// (`udp://*` = any UDP) · icmp:// (no port)
--net-allow target may also be a hostname, resolved via DNS at start
--net-deny target must be a literal IP/CIDR (no hostnames; use --http-deny)
A comma groups ports within one spec (host:80,443); to pass multiple
rules, repeat the flag. IP and CIDR targets are matched by containment
with no DNS (an IP literal is a /32 or /128); only hostnames resolve.
Multiple rules are OR'd. A destination is permitted iff some rule matches the same protocol as the socket plus the destination IP and port (port is N/A for ICMP).
Protocol gating falls out of rule presence per scheme:
- No UDP rule → UDP socket creation is denied at the seccomp layer.
- No ICMP rule → kernel ping socket creation (SOCK_DGRAM + IPPROTO_ICMP) is denied at the seccomp layer.
- Raw ICMP (SOCK_RAW + IPPROTO_ICMP) is never exposed — packet
crafting is out of scope. Workloads that need ping should rely on
the host's
net.ipv4.ping_group_rangeand use the dgram path above (--net-allow icmp://...). - TCP is always permitted at the syscall level; destinations are governed by Landlock and/or the on-behalf path.
Defaults. With no --net-allow and no HTTP ACL flags, Landlock
denies every TCP connect(), UDP / ICMP / raw socket creation are
denied at the seccomp layer, and there is no on-behalf path active.
For unrestricted TCP egress, opt in explicitly with
--net-allow '*'; for any UDP, add --net-allow 'udp://*'.
Denylist (--net-deny). The inverse of the allowlist: networking is
default-allow and the listed targets are blocked. It uses the same
grammar as --net-allow above, the only difference being that targets
must be literal IPs/CIDRs (hostnames are rejected; use --http-deny for
domains). Examples:
--net-deny 10.0.0.0/8 # all ports on a CIDR (all protocols)
--net-deny 169.254.169.254:80 # one IP, one port
--net-deny 169.254.169.254:80,443 # comma-separated ports in one rule
--net-deny '*' # any IP, all ports (TCP)
--net-deny 'udp://192.168.0.0/16' # any UDP to a CIDR
Resolution. Only hostname targets touch DNS: they are resolved once
at sandbox start and pinned in a synthetic /etc/hosts (across all
protocols). IP and CIDR targets are matched by containment directly, so
they never resolve and never appear in /etc/hosts. The synthetic file
replaces the real one only when at least one rule has a concrete
hostname; rules made purely of IPs/CIDRs, :port, udp://*, or
icmp://* leave the real /etc/hosts and DNS visible.
Wildcards. Hostnames are matched literally — --net-allow *.example.com:443 is not supported, list each domain you need (or
use a CIDR/IP target for an address range). The * token is allowed as
the target (alias for empty: *:port ≡ :port) and as the port for
TCP/UDP rules (host:*, :*, *:*).
The port is optional: omitting it means all ports, so host ≡
host:* and * ≡ :* ≡ *:* (and udp://* ≡ udp://*:*). Mixing
* with concrete ports (host:80,*) is rejected. When any TCP rule
uses the all-ports wildcard, Landlock no
longer filters TCP connect at the kernel level (it cannot express
"every port" without enumerating 65535 rules); the on-behalf path
becomes the sole enforcer, and for :* it short-circuits to
allow-all.
Implementation. Two enforcement paths:
- Direct path — pure
:portTCP policies (any IP, no concrete host/IP/CIDR) and no HTTP ACL. Landlock enforces the TCP port allowlist at the kernel level; no per-syscall overhead. UDP and ICMP are not covered by Landlock and always use the on-behalf path when allowed. - On-behalf path — any host, IP, or CIDR target, any HTTP ACL
rule, or any UDP / ICMP rule (the destination IP must be checked,
which Landlock cannot do). Seccomp traps
connect(),sendto(),sendmsg(), andsendmmsg(); the supervisor dups the child fd, queriesgetsockopt(SOL_SOCKET, SO_PROTOCOL)to learn whether the socket is TCP / UDP / ICMP, then checks the destination against that protocol's resolved allowlist before performing the syscall. The HTTP/HTTPS proxy redirect (when configured) happens here too.
HTTP / HTTPS interception. --http-allow / --http-deny route
matching ports through a transparent proxy. Each rule with a concrete
host auto-extends --net-allow with host:80 (and host:443 when
--http-ca is set) so the proxy's intercept ports are reachable;
wildcard hosts auto-add :80 / :443 (any IP). All auto-added
entries are TCP. HTTPS MITM is enabled two ways: pass --http-ca <cert>
and --http-key <key> to bring your own CA, or pass --http-inject-ca <bundle> to have sandlock generate an ephemeral CA (private key in
memory only) and splice its public cert into each named trust bundle at
open time, so the workload trusts the proxy with no manual install. For
runtimes with a compiled-in CA store such as Node, --http-ca-out <path> writes the public cert so you can point the runtime's own env
var at it (e.g. NODE_EXTRA_CA_CERTS). Without any of these, port 443
is not intercepted: --net-allow host:443 permits raw TLS to the host
with no content inspection.
Bind. --net-allow-bind <ports> is independent from --net-allow and
governs server-side bind() as a default-deny allowlist. Each value is a
comma-separated list of single ports or inclusive lo-hi ranges (e.g.
--net-allow-bind 8080,9000-9005), and the flag repeats. Landlock enforces
it (TCP only); --port-remap adds on-behalf virtualization for binding.
--net-deny-bind <ports> is the inverse: default-allow binding, deny the
listed TCP ports (same port syntax, mutually exclusive with
--net-allow-bind). Because Landlock is allowlist-only, a deny-bind relaxes
the Landlock BIND_TCP hook and enforces the denylist on the on-behalf
seccomp bind() path instead.
AF_UNIX sockets are governed by Landlock's
LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET, independent from --net-allow.
Port Virtualization
Each sandbox gets a full virtual port space. Multiple sandboxes can bind
the same port without conflicts. The supervisor performs bind() on behalf
of the child via pidfd_getfd (TOCTOU-safe). When a port conflicts, a
different real port is allocated transparently. /proc/net/tcp is filtered
to only show the sandbox's own ports.
When --port-remap is enabled, the sandbox registers its state in a
shared registry (/dev/shm). Use sandlock list to see all running
sandboxes and sandlock kill to stop them:
$ sandlock list
NAME PID PORTS
api.local 12345 8080
web.local 12346 8080 -> 35299
$ sandlock kill web.local
Killed sandbox 'web.local' (PID 12346)
This enables external reverse proxies (nginx, envoy) to route traffic by name to the correct real port.
Performance
Benchmarked on a typical Linux workstation:
| Workload | Bare metal | Sandlock | Docker | Sandlock overhead |
|---|---|---|---|---|
/bin/echo startup |
2 ms | 7 ms | 307 ms | 5 ms (44x faster than Docker) |
| Redis SET (100K ops) | 82K rps | 80K rps | 52K rps | 97.1% of bare metal |
| Redis GET (100K ops) | 79K rps | 77K rps | 53K rps | 97.1% of bare metal |
| Redis p99 latency | 0.5 ms | 0.6 ms | 1.5 ms | ~2.5x lower than Docker |
| COW fork ×1000 | — | 530 ms | — | 530μs/fork, ~1,900 forks/sec |
Testing
# Rust tests
# Python tests
&& &&
Sandbox Reference
The full Sandbox configuration reference — every field, default,
and grouping — lives in docs/sandbox-reference.md.