solti-model
Domain model for the solti task execution system.
Defines core resource types: Task, TaskSpec, TaskStatus, ObjectMeta, and all supporting domain primitives (phases, policies, selectors, identity newtypes).
Architecture
┌──────────────────────────────────────────────────────────┐
│ Task │
│ │
│ ObjectMeta TaskSpec TaskStatus │
│ ├─ id: TaskId ├─ slot: Slot ├─ phase │
│ ├─ resource_version ├─ kind: TaskKind ├─ attempt │
│ ├─ created_at ├─ timeout ├─ exit_code │
│ └─ updated_at ├─ restart └─ error │
│ ├─ backoff │
│ ├─ admission │
│ ├─ runner_selector │
│ └─ labels │
└──────────────────────────────────────────────────────────┘
Resource model
| Section | Type | Responsibility |
|---|---|---|
| metadata | ObjectMeta |
Identity, resource_version, timestamps |
| spec | TaskSpec |
Desired state (private fields; build via TaskSpec::builder) |
| status | TaskStatus |
Observed state: phase, attempt count, exit code, error |
Task lifecycle
Pending ──► Running ──► Succeeded
│
├──► Failed ──► (restart) ──► Running
├──► Timeout
├──► Canceled
└──► Exhausted (max retries reached)
Terminal phases: Succeeded, Failed, Timeout, Canceled, Exhausted.
Task kinds
| Variant | Backend | Routable |
|---|---|---|
Subprocess |
OS process (command, args) |
yes |
Container |
OCI container image | yes |
Wasm |
WASI module (.wasm) |
yes |
Embedded |
In-process TaskRef |
no |
Embedded tasks are submitted directly via SupervisorApi::submit_with_task.
Routable variants go through RunnerRouter::pick().
Subprocess has two execution strategies (see SubprocessMode):
- Command:
execve(command, args)directly. - Script: interpreter + script body. The body is base64-encoded, UTF-8, capped at
MAX_SCRIPT_BODY_BYTES(2 MiB after decode). Interpreters:Bash,Python,Node,Custom { command, flag }.
Policies
| Type | Controls |
|---|---|
RestartPolicy |
When to restart: Never, OnFailure, Always |
BackoffPolicy |
Delay between retries: initial, max, factor, jitter |
JitterPolicy |
Jitter strategy: None, Full, Equal, Decorrelated |
AdmissionPolicy |
Duplicate handling: DropIfRunning, Replace, Queue |
Runner selector
TaskSpec.runner_selector
┌──────────────────────────────────────────────────────┐
│ match_labels: { "zone": "eu" } │
│ match_expressions: [ {key:"gpu", op:Exists} ] │
└──────────────────────────┬───────────────────────────┘
│ ALL requirements ANDed
▼
RunnerRouter::pick()
┌──────────────────────────────────────────────────────┐
│ Runner A labels: {"zone":"us","gpu":"a100"} ✗ │
│ Runner B labels: {"zone":"eu","gpu":"h100"} ✓ │
│ Runner C labels: {"zone":"eu"} ✗ │
└──────────────────────────────────────────────────────┘
Operators: In, NotIn, Exists, DoesNotExist.
Key types
| Type | Description |
|---|---|
Task |
K8s-style aggregate: metadata + spec + status |
TaskSpec |
Desired state (private fields, build via builder) |
TaskSpecBuilder |
Validated builder for TaskSpec |
TaskStatus |
Observed state: phase, attempt, exit code, error |
ObjectMeta |
Identity, versioning, timestamps |
TaskRun |
Per-attempt execution record with start/finish times |
TaskPhase |
Lifecycle phase enum (7 variants) |
TaskKind |
Execution backend: Subprocess, Wasm, Container, Embedded |
Slot |
Logical execution lane (newtype over Arc<str>) |
TaskId |
Unique task identifier (newtype over Arc<str>) |
AgentId |
Agent identifier (newtype over Arc<str>) |
Timeout |
Per-attempt timeout in milliseconds |
Labels |
Key-value metadata for routing and filtering |
TaskEnv |
Ordered environment variables for task execution |
RunnerEnv |
Ordered environment variables for runner injection |
Flag |
Boolean toggle with enabled()/disabled() constructors |
RunnerSelector |
Label selector for runner routing |
TaskQuery |
Builder for filtered, paginated task listing |
TaskPage |
Paginated query result |
Size limits
Exposed as pub const so downstream layers (API, CP, UI) share one source of truth.
| Constant | Value | Enforced by |
|---|---|---|
MAX_SCRIPT_BODY_BYTES |
2 MiB | SubprocessMode::Script::validate |
SLOT_MAX_LEN |
64 | Slot::validate_format |
TASK_ID_MAX_LEN |
256 | TaskId::validate_format |
AGENT_ID_MAX_LEN |
128 | AgentId::validate_format |
Identity rules
Slot, TaskId, AgentId allow [A-Za-z0-9._-] only, reject . and ...
No whitespace, no path separators, no non-ASCII: these values reach cgroup paths, tempfile names, execve argv, and log fields, where anything else misbehaves.
Validation runs at TaskSpec::validate / submit time.
Versioning
ObjectMeta.resource_version is a monotonic counter bumped on every change
(spec or status) for optimistic concurrency control.
Construction
let spec = TaskSpec::builder("my-slot", kind, 5_000u64)
.restart(RestartPolicy::OnFailure)
.backoff(BackoffPolicy { jitter: JitterPolicy::Equal, first_ms: 1_000, max_ms: 30_000, factor: 2.0 })
.build()?;
spec.validate()?; // submit-boundary validation (rejects Embedded)
Error model
Variant When
─────── ────
UnknownAdmission unknown admission policy string
UnknownRestart unknown restart policy string
UnknownJitter unknown jitter policy string
UnknownTaskPhase unknown task phase string
Invalid structural validation failure (empty slot, bad backoff, etc.)
Notes
TaskSpecfields are private — useTaskSpec::builder()for construction andserdefor deserialization.- Deserialization goes through
#[serde(try_from = "TaskSpecRaw")]which validates on parse. BackoffPolicyis also validated on deserialize via its owntry_fromraw; zerofirst_ms, invertedmax_ms, or non-finite/<1.0factorare rejected at parse time.- Identity newtypes (
Slot,TaskId,AgentId) wrapArc<str>viaarc_str_newtype!; environment newtypes (TaskEnv,RunnerEnv) wrapVec<KeyValue>viaenv_newtype!. Both macros keep parallel types in lockstep. BackoffPolicyimplementsEq/Hashviaf64::to_bits()for thefactorfield.TaskPhase,RestartPolicy,AdmissionPolicy,JitterPolicyall implementFromStrfor CLI/config parsing.Labelsis backed byBTreeMap<String, String>for deterministic iteration order.- Most types derive
Serialize/DeserializewithcamelCasefield renaming. The one exception isSelectorOperator, which serializes as PascalCase (In,NotIn,Exists,DoesNotExist) to match the KubernetesLabelSelectorOperatorconvention. TaskKind,TaskPhase,RestartPolicy,AdmissionPolicy,JitterPolicy,SelectorOperatorare#[non_exhaustive]— adding new variants is a non-breaking change.- Pagination constants for list endpoints:
DEFAULT_LIMIT = 100,MAX_LIMIT = 1000(re-exported aspub constso downstream API layers share one source of truth).