Skip to main content

Module seccomp

Module seccomp 

Source
Expand description

Seccomp-BPF syscall filtering.

Seccomp-BPF allows filtering syscalls using Berkeley Packet Filter (BPF) programs. This provides a second layer of defense after Landlock - even if a path is accessible, dangerous syscalls are blocked.

§Filter Structure

The BPF filter runs on every syscall:

  1. Verify architecture is x86_64 (kill otherwise)
  2. Load syscall number from seccomp_data
  3. Block clone3 entirely (cannot inspect flags in struct)
  4. For clone, inspect flags and block namespace creation
  5. For socket, inspect domain and block dangerous types (AF_NETLINK, SOCK_RAW)
  6. Compare other syscalls against whitelist
  7. Allow if match found, kill process otherwise

§Clone Flag Filtering

The clone syscall is allowed but with restricted flags:

  • CLONE_NEWUSER - User namespace (kernel attack surface)
  • CLONE_NEWNET - Network namespace (nf_tables access)
  • CLONE_NEWNS - Mount namespace
  • CLONE_NEWPID - PID namespace
  • CLONE_NEWIPC - IPC namespace
  • CLONE_NEWUTS - UTS namespace
  • CLONE_NEWCGROUP - Cgroup namespace

The clone3 syscall is blocked entirely because its flags are passed via a userspace struct pointer that BPF cannot dereference.

§Socket Filtering

The socket syscall is filtered to block:

  • AF_NETLINK (16) - Access to kernel netlink interfaces (nf_tables, CVE-2024-1086)
  • SOCK_RAW (3) - Raw packet access (can craft arbitrary packets)

Allowed socket types:

  • AF_UNIX (1) - Local IPC (Python multiprocessing, etc.)
  • AF_INET/AF_INET6 (2, 10) - Network (Landlock controls actual access)

§Removed Dangerous Syscalls

  • memfd_create + execveat - Enables fileless execution (bypass Landlock)
  • setresuid/setresgid - No reason to change UID in sandbox
  • setsid/setpgid - Session manipulation, unnecessary
  • ioctl - Too powerful without argument filtering (TODO: whitelist specific codes)

§Security Notes

  • Filter is permanent - cannot be removed once applied
  • Requires PR_SET_NO_NEW_PRIVS first
  • Blocked syscall = immediate process termination (SIGSYS)
  • kill/tgkill are safe because we use PID namespace (CLONE_NEWPID)
  • prctl allowed but PR_SET_SECCOMP has no effect (filter already applied)

Structs§

SockFilter
SockFprog

Constants§

DEFAULT_WHITELIST
Syscalls allowed in the sandbox.

Functions§

build_whitelist_filter
Builds a BPF filter with clone and socket argument filtering.
seccomp_available
Returns true if seccomp is available.
seccomp_set_mode_filter
Applies a seccomp-BPF filter to the current thread.