# solti-exec
Task execution backends for the solti task system.
Provides concrete `Runner` implementations that turn `TaskSpec` into running OS processes.
Currently, ships a single backend - `SubprocessRunner` with optional Linux sandboxing (rlimits, cgroup v2, capabilities).
## Architecture
```text
TaskSpec { kind: Subprocess(..) }
│
▼ RunnerRouter::pick()
SubprocessRunner
│
├──► build_task_config(spec, ctx)
│ ├──► resolve SubprocessMode → (command, args [, script_tempfile])
│ │ (Script mode: body → NamedTempFile 0600 → path as argv[0])
│ ├──► merge_env(task_env, runner_env)
│ └──► SubprocessTaskConfig { run_id, command, args, env, cwd }
│
├──► prepare backend (cgroup dirs, if configured)
│
└──► run_subprocess(ctx, cancel)
├──► build Command + apply pre_exec hooks
│ (process_group(0), kill_on_drop(true))
├──► spawn + pipe stdout/stderr
├──► select! biased { child.wait(), cancel → killpg }
├──► record metrics
└──► cleanup cgroup (if any)
```
## Subprocess lifecycle
```text
build_task ──► prepare_backend ──► spawn ──► log_stream (stdout/stderr)
(pgid, ├──► tracing::info/warn
kill_on_drop) └──► OutputSink (if registry wired)
│
├──► child.wait() → evaluate exit
├──► cancel.cancelled() → killpg -SIGKILL
│ → wait() to reap
▼
metrics + cleanup
```
`biased` select prefers `child.wait()` over `cancel.cancelled()` — a process that has already exited cleanly is never misreported as cancelled, even if the cancel token fired in the same microsecond.
## Key types
| `SubprocessRunner` | `Runner` impl for `TaskKind::Subprocess` |
| `SubprocessBackendConfig` | Builder for rlimits + cgroups + security + logger settings |
| `SubprocessTaskConfig` | Fully resolved per-task config (command, args, env, cwd) |
| `LogConfig` | Stdout/stderr logging: truncation length, log levels |
| `RlimitConfig` | POSIX rlimits (nofile, fsize, core, nproc, as) |
| `CgroupLimits` | cgroup v2: CPU quota/period, memory, PIDs |
| `CpuMax` | CPU quota + period for `cpu.max` |
| `SecurityConfig` | Capability drop + `no_new_privs` |
| `LinuxCapability` | Capability enum with kernel `cap_value` constants |
| `ExecError` | Configuration and spawn-time errors |
## Backend config
```text
SubprocessBackendConfig::new()
.with_rlimits(RlimitConfig { max_open_files: Some(1024), .. })
.with_cgroups(CgroupLimits { cpu: Some(CpuMax { quota: Some(50_000), period: 100_000 }), memory: Some(128 MB), pids: Some(32) })
.with_security(SecurityConfig { keep_capabilities: vec![NetBindService], no_new_privs: true })
.with_logger(LogConfig { max_line_length: 4096, stdout_info: true, stderr_warn: true })
```
All settings are optional — without a backend config the subprocess inherits parent process settings.
## Sandboxing (pre_exec hooks)
```text
fork()
┌───────────────────────────────────────────────────────────────────┐
│ child process (before execve) │
│ │
│ 1. rlimits: getrlimit → clamp → setrlimit │
│ 2. cgroup: open /sys/fs/cgroup/{name}/cgroup.procs → write PID │
│ 3. security: capget → mask → capset → no_new_privs │
│ │
│ execve(command, args) │
└───────────────────────────────────────────────────────────────────┘
```
All pre_exec hooks are **async-signal-safe**: zero heap allocation, only raw libc syscalls, `Copy`-only captures.
## Registration
```text
register_subprocess_runner(&mut router, "default")
├──► SubprocessRunner::new("default")
├──► label "runner-name" = "default"
└──► router.register_with_labels()
register_subprocess_runner_with_backend(&mut router, "secure", backend)
├──► validate backend config
├──► SubprocessRunner::with_config("secure", backend)
├──► label "runner-name" = "secure"
└──► router.register_with_labels()
```
Duplicate names are rejected via `router.contains_label()` → `ExecError::DuplicateRunner`.
## Error model
```text
Variant When
────── ────
DuplicateRunner runner with this name already registered
InvalidRunnerConfig backend config validation failure
InvalidSpec task spec validation failure (empty command, etc.)
Internal unexpected internal error
Io OS-level I/O error
```
## Feature flags
| `subprocess` | `subprocess` module, `libc`, `base64`, `tokio/process` |
## Notes
- `SubprocessRunner` implements `Runner` trait from `solti-runner`.
- Mode resolution: `Command` → direct exec; `Script` → decode base64 body → write to a `NamedTempFile` (mode 0600) → exec interpreter with the path. The tempfile is kept alive for the task's lifetime via `Arc` and unlinked on drop.
- Script body is capped at `solti_model::MAX_SCRIPT_BODY_BYTES` (2 MiB, decoded) by the model; the tempfile transport avoids Linux's per-arg `MAX_ARG_STRLEN` (128 KiB) limit that `-c <inline>` would hit.
- Cancel uses **process-group kill** on Unix: `Command::process_group(0)` sets pgid = child pid, then cancel sends `SIGKILL` to `-pgid` so forked helpers (`sleep 1000 &`) die together with the parent. `kill_on_drop(true)` covers the drop-without-wait path.
- Environment merge: runner env overrides task env (last-writer-wins via `BTreeMap`). Parent env is currently inherited — no automatic `env_clear()`.
- Cgroup lifecycle is two-phase: `prepare` (mkdir + write limits in parent) → `attach` (join PID in child via pre_exec).
- Cgroup names are auto-generated: `{runner}-{slot}-{seq:x}-{timestamp:x}`.
- Line truncation uses `Cow::Borrowed` for the common case (zero-alloc hot path).
- `log_stream` is double-headed: every line goes to `tracing` (existing path) and, if the supervisor wired an `OutputRegistry` into `BuildContext`, also to a `solti_runner::OutputSink` (live-tail subscribers). When no registry is attached, the sink push is a no-op cost (one Arc clone, one `broadcast::send` that returns `Err` and is ignored).
- `LinuxCapability` values match `<linux/capability.h>` from Linux 6.x.
- On non-Linux platforms, all sandboxing is no-op with `tracing::warn`.