Nucleus
Extremely lightweight Docker alternative for agents and production services
Nucleus is a minimalist container runtime for Linux. It provides isolated execution environments using Linux kernel primitives without the overhead of traditional container runtimes. Nucleus supports two operating modes:
- Agent mode (default) — ephemeral, fast-startup sandboxes for AI agent workloads
- Production mode — strict isolation for long-running, network-bound NixOS services with declarative configuration, egress policy enforcement, health checks, and systemd integration
Why Nucleus?
- Zero-overhead isolation – Direct use of cgroups, namespaces, pivot_root, capabilities, seccomp, and Landlock
- Memory-backed filesystems – Container disk mapped to tmpfs, pre-populated with agent context
- gVisor integration – Optional application kernel for enhanced security, including networked service mode
- Production service support – Declarative NixOS module, egress policies, health checks, secrets mounting, sd_notify, and journald integration
- Minimal rootfs – Replace host bind mounts with a purpose-built Nix store closure for production services
- External security policies – Per-service seccomp profiles (JSON), capability policies (TOML), and Landlock rules (TOML) with SHA-256 pinning
- Seccomp profile generation – Trace mode records syscalls, then
nucleus seccomp generatecreates a minimal allowlist profile - Multi-container topologies – Compose-equivalent TOML format with dependency DAG, reconciliation, and NixOS systemd integration
- Integrity & audit controls – Structured audit log, context hashing, rootfs attestation, seccomp deny logging, mount flag verification, and kernel lockdown assertions
- Structured telemetry – Optional OpenTelemetry export for container lifecycle tracing
- Linux-native – Runs on standard Linux and NixOS
Architecture
Nucleus leverages Linux kernel isolation primitives:
- Namespaces – PID, mount, network, UTS, IPC, user, cgroup, and optional time isolation
- cgroups v2 – Resource limits (CPU, memory, PIDs, I/O)
- pivot_root – Filesystem isolation (chroot fallback available in agent mode only)
- Capabilities – All capabilities dropped by default, or configured via TOML policy file (irreversible)
- seccomp – Syscall whitelist filtering with per-service JSON profiles and trace-based generation (irreversible)
- Landlock – Path-based filesystem access control via hardcoded defaults or TOML policy file (Linux 5.13+)
- gVisor – Optional application kernel (runsc) with None/Sandbox/Host network modes
- PID 1 init – Mini-init supervisor in production mode for zombie reaping and signal forwarding
- In-memory secrets – Dedicated tmpfs at
/run/secretswith volatile zeroing of source buffers - Mount audit – Post-setup verification of mount flags in production mode
Container filesystem is backed by tmpfs and either populated with context files (agent mode) or mounted from a pre-built Nix rootfs closure (production mode).
Platform Support
- Linux (kernel 6.x+) on
x86_64 - NixOS (first-class NixOS module support)
- Not supported: macOS, Windows, BSDs, 32-bit Linux
Installation
Or via Nix:
Usage
Agent Mode (default)
# Run agent in isolated container with pre-populated context
# Specify resource limits
# Name your container
# Use gVisor for enhanced isolation
# Rootless mode
# Optional networking
# Context streaming (bind mount for instant access)
# Integrity and audit hardening
# Environment variables
# Pass sensitive values via --secret (mounted in-memory at /run/secrets)
Production Mode
Production mode enforces strict security invariants:
- Forbids
--allow-degraded-security,--allow-chroot-fallback, and--allow-host-network - Requires explicit
--memorylimit - Requires successful cgroup creation (no fallback to running without limits)
- Egress policy failures are fatal (no silent degradation)
- Bridge DNS must be configured explicitly (no public resolver defaults)
# Run a long-running service with production hardening
# gVisor with network access (sandbox network stack)
Security Policy Files
Nix defines the service (what runs). Separate files define security policy (what the process is allowed to do at the kernel level). This separation keeps security config auditable, tool-compatible, and on its own change cadence.
# Run with external security policies
Seccomp profile (JSON — OCI-native format, tooling emits it directly):
Capability policy (TOML):
# config/my-service.caps.toml
[]
= [] # empty = drop all
[]
= []
Landlock policy (TOML):
# config/my-service.landlock.toml
= 3
[[]]
= "/bin"
= ["read", "execute"]
[[]]
= "/etc/myservice"
= ["read"]
[[]]
= "/run/secrets"
= ["read"]
[[]]
= "/tmp"
= ["read", "write", "create", "remove"]
Seccomp Profile Generation
Profiles shouldn't be hand-written from scratch. Use trace mode to record actual syscall usage, then generate a minimal profile:
# 1. Run in trace mode — all syscalls allowed but logged
# 2. Generate minimal profile from trace
# 3. Review and tighten (remove anything surprising)
# 4. Commit — Nix pins the SHA-256 hash
# 5. Run in enforce mode
Trace mode requires root or CAP_SYSLOG (reads /dev/kmsg). It is rejected in production mode — it is a development tool only.
Multi-Container Topologies
Nucleus includes a Compose-equivalent for managing multi-container stacks using TOML configuration with dependency ordering.
# topology.toml
= "myapp"
[]
= "10.42.0.0/24"
[]
= "/nix/store/...-postgres"
= ["postgres", "-D", "/var/lib/postgresql/data"]
= "2G"
= 2.0
= ["internal"]
= "pg_isready -U myapp"
[]
= "/nix/store/...-web"
= ["/bin/web-server"]
= "512M"
= ["internal"]
= ["8443:8443"]
= ["10.42.0.0/24"]
[[]]
= "postgres"
= "healthy"
# Validate topology and show dependency order
# Bring up all services in dependency order
# Show service status
# Tear down in reverse dependency order
Container Management
# List running containers
# List all containers (including stopped)
# Show resource usage statistics
# Stop a container (SIGTERM, then SIGKILL after timeout)
# Kill a container with a specific signal
# Remove a stopped container
# Attach to a running container
# Checkpoint a running container (requires root, CRIU)
# Restore from checkpoint
NixOS Module
Nucleus provides a declarative NixOS module for running containers as systemd services. Each container is managed as a nucleus-<name>.service unit with journald logging, sd_notify readiness, and automatic restart.
Flake Setup
{
inputs.nucleus.url = "github:0kenx/nucleus";
outputs = { self, nixpkgs, nucleus, ... }: {
nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
nucleus.nixosModules.default
./configuration.nix
];
};
};
}
Service Configuration
{ pkgs, nucleus, ... }:
let
# Build a minimal rootfs containing only the packages your service needs.
# This replaces host bind mounts with a locked-down Nix closure.
proxyRootfs = nucleus.lib.mkRootfs {
inherit pkgs;
packages = [ my-proxy-pkg pkgs.cacert pkgs.curl ];
};
in
{
services.nucleus = {
enable = true;
package = nucleus.packages.x86_64-linux.default;
containers.sigid-proxy = {
enable = true;
command = [ "/bin/sigid-proxy" "--config" "/etc/sigid/proxy.toml" ];
rootfs = proxyRootfs;
# Resource limits (required in production mode)
memory = "1G";
cpus = 2.0;
pids = 256;
# Security policy files (separate from Nix, auditable by security engineers)
seccompProfile = {
path = ./config/sigid-proxy.seccomp.json;
sha256 = "abc123..."; # Nix verifies at build time
};
capsPolicy = ./config/sigid-proxy.caps.toml;
landlockPolicy = ./config/sigid-proxy.landlock.toml;
# Optional hardening toggles
verifyRootfsAttestation = true;
seccompLogDenied = true;
requireKernelLockdown = "integrity";
# Networking
network = "bridge";
dns = [ "10.0.0.1" ]; # internal resolver — no public DNS default
portForwards = [ "8080:8080" "8443:8443" ];
# Egress policy — audited outbound access
egressAllow = [ "10.0.0.0/8" ];
egressTcpPorts = [ 443 8443 ];
# Health checking
healthCheck = "curl -sf http://localhost:8080/health";
healthInterval = 30;
healthRetries = 3;
healthStartPeriod = 10;
# Secrets (mounted read-only)
secrets = [
{ source = config.age.secrets.proxy-tls.path; dest = "/etc/tls/cert.pem"; }
];
# Environment
environment = {
RUST_LOG = "info";
CONFIG_PATH = "/etc/sigid/proxy.toml";
};
# systemd integration
sdNotify = true; # Type=notify, passes NOTIFY_SOCKET into container
};
};
}
Topology Services
Topologies can also be managed as systemd services:
{
services.nucleus = {
enable = true;
package = nucleus.packages.x86_64-linux.default;
topologies.myapp = {
enable = true;
configFile = ./topology.toml;
};
};
}
This creates a nucleus-topology-myapp.service (Type=oneshot, RemainAfterExit) that runs nucleus compose up on start and nucleus compose down on stop.
What the Module Generates
For each enabled container, the module creates a systemd service:
- Unit:
nucleus-<name>.service, ordered afternetwork-online.target - Type:
notify(whensdNotify = true) orsimple - Restart:
on-failurewith 5s backoff - Logging: stdout/stderr captured to journald with
SyslogIdentifier=nucleus-<name> - Command:
nucleus run --service-mode production ...with all configured options - Hardening:
ProtectSystem=strict,ProtectHome=trueat the systemd level (defense-in-depth)
Building a Rootfs
Use nucleus.lib.mkRootfs to build a minimal, reproducible root filesystem:
nucleus.lib.mkRootfs {
inherit pkgs;
name = "my-service-rootfs"; # optional, defaults to "nucleus-rootfs"
packages = [
my-service-package
pkgs.cacert # TLS certificates
pkgs.curl # for health checks
pkgs.busybox # minimal coreutils
];
}
This produces a Nix store path containing /bin, /lib, /etc, etc. from the specified packages. It is mounted read-only inside the container, replacing the host bind mounts used in agent mode.
mkRootfs also emits a .nucleus-rootfs-sha256 manifest at the root of the closure. Use --verify-rootfs-attestation or verifyRootfsAttestation = true; to require that manifest to match the mounted rootfs at startup.
Security Notes
Do not pass secrets via -e / --env. Environment variables are visible in /proc/<pid>/environ to any process that can read it (mitigated by hidepid=2 in production mode, but not in agent mode). Use --secret instead — secrets are mounted on an in-memory tmpfs at /run/secrets with volatile source buffer zeroing.
Agent mode is not hardened. By design, agent mode applies several security mechanisms on a best-effort basis: seccomp and Landlock failures are warn-and-continue (with --allow-degraded-security), chroot fallback is available (with --allow-chroot-fallback), bridge DNS defaults to public resolvers (8.8.8.8), and cgroup creation failures are non-fatal. Operators requiring strict isolation should use production mode, which makes all of these fatal.
Production Mode vs Agent Mode
| Feature | Agent Mode | Production Mode |
|---|---|---|
| Service mode | --service-mode agent (default) |
--service-mode production |
| Degraded security | Allowed with flag | Forbidden |
| Chroot fallback | Allowed with flag | Forbidden |
| Host networking | Allowed with flag | Forbidden |
| Cgroup limits | Best-effort | Required (fatal on failure) |
| Bridge DNS | Defaults to 8.8.8.8/8.8.4.4 | Must be configured explicitly |
| Rootfs | Host bind mounts (/bin, /usr, /lib, /nix) | Pre-built Nix closure (--rootfs) |
| Egress policy | Optional | Deny-all default (fatal on apply failure) |
| Memory limit | Optional | Required |
| PID 1 init | Direct exec | Mini-init with zombie reaping + signal forwarding |
| Secrets | Bind mount | In-memory tmpfs with volatile zeroing |
| /proc | Mounted normally | hidepid=2 (hides other processes) |
| Mount audit | Skipped | Post-setup flag verification (fatal) |
| Seccomp trace mode | Allowed | Forbidden |
| Landlock ABI | Best-effort | V3 minimum required |
| Health checks | Optional | Optional |
| sd_notify | Optional | Optional |
| Security policies | Optional | Optional (recommended) |
Egress Policy
When --egress-allow is specified, Nucleus applies iptables OUTPUT chain rules inside the container's network namespace:
- Allow loopback traffic
- Allow established/related connections
- Allow DNS to configured resolvers
- Allow traffic to permitted CIDRs (optionally restricted to specific ports)
- Log denied packets (rate-limited,
nucleus-egress-denied:prefix) - Drop everything else
# Allow outbound to internal network on HTTPS only
# Deny-all egress (only DNS to configured resolvers is allowed)
gVisor Network Modes
When using gVisor (--runtime gvisor), the network mode is automatically selected:
Container --network |
gVisor --network flag |
Description |
|---|---|---|
none |
none |
Fully isolated (default for agents) |
bridge |
sandbox |
gVisor user-space network stack |
host |
host |
Shared host network namespace |
The sandbox mode gives gVisor-isolated services full network access through gVisor's user-space TCP/IP stack, without exposing the host kernel's network code.
Additional Hardening Flags
--seccomp-profile <path>loads a custom per-service seccomp profile (OCI JSON format).--seccomp-profile-sha256 <hex>verifies the profile's SHA-256 hash before loading.--seccomp-mode trace|enforceswitches between trace (record all syscalls) and enforce (default).--seccomp-log <path>writes NDJSON syscall trace when in trace mode.--caps-policy <path>loads a TOML capability policy (replaces default drop-all).--caps-policy-sha256 <hex>verifies the capability policy hash.--landlock-policy <path>loads a TOML Landlock filesystem policy (replaces default rules).--landlock-policy-sha256 <hex>verifies the Landlock policy hash.--verify-context-integrityhashes the source context tree before launch and verifies the populated/contexttree matches.--verify-rootfs-attestationrequires a.nucleus-rootfs-sha256manifest and verifies the mounted rootfs against it.--seccomp-log-deniedrequests kernel logging for denied seccomp decisions when the host supportsSECCOMP_FILTER_FLAG_LOG.--require-kernel-lockdown integrity|confidentialityrefuses startup unless/sys/kernel/security/lockdownsatisfies the requested mode.--gvisor-platform systrap|kvm|ptraceselects the runsc backend explicitly.--time-namespaceenables Linux time namespaces for native containers.--disable-cgroup-namespaceturns off cgroup namespace isolation when a workload needs the host cgroup view.
If NUCLEUS_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT is set, Nucleus exports lifecycle spans over OTLP in addition to normal local logging.
Development
This project uses Nix flakes for reproducible builds:
# Enter development shell
# Build
# Run tests
# Run with Apalache installed (for TLA+ trace replay)
# Build release binary
# Clippy
Project Structure
nucleus/
├── src/
│ ├── container/ # Container orchestration, lifecycle, state, config
│ ├── isolation/ # Namespace management, user mapping, attach
│ ├── resources/ # cgroup v2 resource control, stats
│ ├── filesystem/ # tmpfs, rootfs mounting, context population, secrets, attestation
│ ├── security/ # Capabilities, seccomp, Landlock, gVisor, OCI, policy files
│ │ ├── caps_policy.rs # TOML capability policy loader
│ │ ├── landlock_policy.rs # TOML Landlock policy loader
│ │ ├── seccomp_trace.rs # Seccomp trace mode (syscall recording)
│ │ ├── seccomp_generate.rs # Profile generator from traces
│ │ └── policy.rs # Shared policy infrastructure (SHA-256, TOML/JSON loaders)
│ ├── network/ # Networking (none/host/bridge), egress policy
│ ├── topology/ # Multi-container topology (Compose equivalent)
│ │ ├── config.rs # TOML topology config (services, networks, volumes)
│ │ ├── dag.rs # Dependency DAG with topological sort
│ │ ├── reconcile.rs # Diff running vs desired state, apply changes
│ │ └── dns.rs # Per-topology /etc/hosts DNS
│ ├── checkpoint/ # CRIU checkpoint/restore
│ ├── audit.rs # Structured audit log (JSON events)
│ └── error.rs # Error types
├── nix/
│ └── module.nix # NixOS module (containers + topologies)
├── config/ # Security policy files (per-service)
│ ├── *.seccomp.json # Seccomp syscall allowlists (OCI format)
│ ├── *.caps.toml # Capability bounding set policies
│ └── *.landlock.toml # Landlock filesystem access rules
├── tests/
│ ├── model_based_* # Property-based tests from TLA+ specs
│ └── tla_* # tla-connect driver tests
├── formal/tla/ # TLA+ formal specifications
├── intent/ # Intent high-level specs
└── flake.nix # Nix flake (packages, modules, lib.mkRootfs)
Testing
Nucleus uses spec-driven development with comprehensive testing:
- Unit tests: Individual component functionality
- Model-based tests: Property-based tests verifying TLA+ specifications
- tla-connect tests: TLA+ to Rust state machine mapping
- Integration tests: Complete container lifecycle
All state machines are formally verified using TLA+ and the Apalache model checker.
System-Level TLA+ Model
A composed system model verifies cross-subsystem ordering, authorization, and end-to-end progress:
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.