Expand description
Seccomp-BPF syscall filtering.
Seccomp-BPF allows filtering syscalls using Berkeley Packet Filter (BPF) programs. This provides a second layer of defense after Landlock - even if a path is accessible, dangerous syscalls are blocked.
§Filter Structure
The BPF filter runs on every syscall:
- Verify architecture is
x86_64(kill otherwise) - Load syscall number from
seccomp_data - Block
clone3entirely (cannot inspect flags in struct) - For
clone, inspect flags and block namespace creation - For
socket, inspect domain and block dangerous types (AF_NETLINK,SOCK_RAW) - Compare other syscalls against whitelist
- Allow if match found, kill process otherwise
§Clone Flag Filtering
The clone syscall is allowed but with restricted flags:
CLONE_NEWUSER- User namespace (kernel attack surface)CLONE_NEWNET- Network namespace (nf_tablesaccess)CLONE_NEWNS- Mount namespaceCLONE_NEWPID- PID namespaceCLONE_NEWIPC- IPC namespaceCLONE_NEWUTS- UTS namespaceCLONE_NEWCGROUP- Cgroup namespace
The clone3 syscall is blocked entirely because its flags are passed via
a userspace struct pointer that BPF cannot dereference.
§Socket Filtering
The socket syscall is filtered to block:
AF_NETLINK(16) - Access to kernel netlink interfaces (nf_tables, CVE-2024-1086)SOCK_RAW(3) - Raw packet access (can craft arbitrary packets)
Allowed socket types:
AF_UNIX(1) - Local IPC (Python multiprocessing, etc.)AF_INET/AF_INET6(2, 10) - Network (Landlock controls actual access)
§Removed Dangerous Syscalls
memfd_create+execveat- Enables fileless execution (bypass Landlock)setresuid/setresgid- No reason to change UID in sandboxsetsid/setpgid- Session manipulation, unnecessaryioctl- Too powerful without argument filtering (TODO: whitelist specific codes)
§Security Notes
- Filter is permanent - cannot be removed once applied
- Requires
PR_SET_NO_NEW_PRIVSfirst - Blocked syscall = immediate process termination (SIGSYS)
kill/tgkillare safe because we use PID namespace (CLONE_NEWPID)prctlallowed butPR_SET_SECCOMPhas no effect (filter already applied)
Structs§
Constants§
- DEFAULT_
WHITELIST - Syscalls allowed in the sandbox.
Functions§
- build_
whitelist_ filter - Builds a BPF filter with clone and socket argument filtering.
- seccomp_
available - Returns true if seccomp is available.
- seccomp_
set_ ⚠mode_ filter - Applies a seccomp-BPF filter to the current thread.