# Service Configuration Defaults
When creating a new service in zinit using the `service.set` RPC endpoint, fields that are not explicitly specified will use the default values documented below.
## Service Definition Defaults
The `[service]` section defines the core service properties.
### Required Fields
These fields **must** be specified when creating a service:
| `name` | string | Unique service identifier |
| `exec` | string | Executable command to run (full path or relative to $PATH) |
### Optional Fields with Defaults
| `dir` | string | `null` | Working directory for the service. If not set, inherits parent's working directory |
| `oneshot` | boolean | `false` | If true, service runs once and exits. Supervisor does not keep it running |
| `env` | object | `{}` (empty) | Environment variables to pass to the service as key-value pairs |
| `status` | string | `"start"` | Desired state. Options: `"start"`, `"stop"`, `"ignore"` |
| `class` | string | `"user"` | Service classification. Options: `"user"`, `"system"` |
| `critical` | boolean | `false` | If true (PID1 only), service failure triggers emergency shell. Normal services ignore this |
### Status Field
Controls how the supervisor manages the service:
- **`start`** (default) - Supervisor ensures service is running. If it crashes, it will be restarted according to restart policy
- **`stop`** - Supervisor ensures service is stopped and keeps it stopped
- **`ignore`** - Supervisor doesn't manage this service's state. No auto-restart or auto-stop. Useful for manual control via CLI
### Class Field
Controls service protection and visibility:
- **`user`** (default) - Normal service. Affected by bulk operations (e.g., `zinit stop-all` will stop this service)
- **`system`** - Protected service. Bulk operations skip system services. Only stopped explicitly or if dependencies require it
---
## Dependency Defaults
The `[dependencies]` section declares relationships with other services.
All dependency fields are **optional** and default to empty lists if not specified:
| `after` | array of strings | `[]` | Services that must start BEFORE this service starts |
| `requires` | array of strings | `[]` | Hard dependencies. If any fail, this service cannot start |
| `wants` | array of strings | `[]` | Soft dependencies. Missing dependencies are ignored |
| `conflicts` | array of strings | `[]` | Services that cannot run at the same time. If any are running, this cannot start |
### Dependency Semantics
- **`after`**: Purely ordering. Service A must start before service B, but B doesn't fail if A crashes
- **`requires`**: Hard dependency. If dependency fails, this service is blocked. If dependency is removed, this service is cascade-removed
- **`wants`**: Soft dependency. If dependency is missing or fails, this service starts anyway
- **`conflicts`**: Mutual exclusion. Services cannot run simultaneously. If conflicting service is running, this service is blocked
---
## Lifecycle Defaults
The `[lifecycle]` section controls restart behavior, timeouts, and signals.
| `restart` | string | `"on_failure"` | When to restart. Options: `"always"`, `"on_failure"`, `"never"` |
| `restart_delay_ms` | integer | `1000` | Initial delay before first restart (milliseconds) |
| `restart_delay_max_ms` | integer | `300000` | Maximum delay cap for exponential backoff (5 minutes) |
| `max_restarts` | integer | `10` | Max restart attempts. 0 = unlimited |
| `stability_period_ms` | integer | `30000` | Service must run this long before backoff counter resets (30 seconds) |
| `start_timeout_ms` | integer | `30000` | Maximum time allowed for service startup before timeout (30 seconds) |
| `stop_timeout_ms` | integer | `10000` | Maximum time for graceful shutdown before SIGKILL (10 seconds) |
| `stop_signal` | string | `"SIGTERM"` | Signal sent during graceful shutdown |
### Restart Policy
- **`always`** - Service always restarts when it exits, regardless of exit code
- **`on_failure`** (default) - Service only restarts if it exits with non-zero exit code
- **`never`** - Service never restarts. Manual restart required via CLI
### Restart Backoff Algorithm
1. Service crashes → wait `restart_delay_ms`
2. If service crashes again → wait `restart_delay_ms * 2`
3. Double delay continues up to `restart_delay_max_ms`
4. After `stability_period_ms` of stable running → reset delay to `restart_delay_ms`
**Example**: With defaults (1000ms initial, 300000ms max):
- 1st restart: 1 second
- 2nd restart: 2 seconds
- 3rd restart: 4 seconds
- 4th restart: 8 seconds
- ... continues doubling ...
- Cap at 300 seconds (5 minutes)
- If service runs stable for 30 seconds, counter resets to 1 second
### Shutdown Flow
When supervisor stops a service:
1. Send `stop_signal` (default: SIGTERM) to process group
2. Wait up to `stop_timeout_ms` (default: 10 seconds)
3. If still running → send SIGKILL
4. Wait for process to exit
---
## Health Check Defaults
The `[health]` section is **optional**. If omitted, no health checks are performed.
### Health Check Types
Health checks are tagged unions - you specify which type by the fields present:
#### HTTP Health Check
```toml
[health]
type = "http"
target = "http://localhost:8080/health"
expect_status = 200
interval_ms = 10000
timeout_ms = 5000
retries = 3
start_period_ms = 0
```
#### TCP Health Check
```toml
[health]
type = "tcp"
target = "localhost:5432"
interval_ms = 10000
timeout_ms = 5000
retries = 3
start_period_ms = 0
```
#### Command Execution Health Check
```toml
[health]
type = "exec"
target = "/usr/bin/healthcheck.sh"
interval_ms = 10000
timeout_ms = 5000
retries = 3
start_period_ms = 0
```
### Health Check Common Fields
All health check types share these common fields:
| `interval_ms` | integer | `10000` | How often to run the check (10 seconds) |
| `timeout_ms` | integer | `5000` | Maximum time for check to complete (5 seconds) |
| `retries` | integer | `3` | Consecutive failures before marking unhealthy |
| `start_period_ms` | integer | `0` | Grace period before first check (0 = check immediately) |
### HTTP-Specific Fields
| `expect_status` | integer | `200` | Expected HTTP status code for health |
### Health Check Behavior
- **Grace period**: First check doesn't run until `start_period_ms` has elapsed
- **Healthy → Unhealthy**: Requires `retries` consecutive failures
- **Unhealthy effect**: Service marked unhealthy but supervisor doesn't auto-restart (informational only)
- **Missing field**: If a required field for the check type is missing, health check is ignored
---
## Logging Defaults
The `[logging]` section controls in-memory buffering and optional log persistence.
| `buffer_lines` | integer | `1000` | Number of recent log lines kept in memory |
| `file` | string | `null` | Optional file path to write logs to |
| `forward` | string | `null` | Optional destination to forward logs to (syslog, HTTP, etc.) |
### Logging Behavior
- **In-memory buffer**: Always kept, contains last N lines. Can be retrieved via `service.logs` RPC
- **File logging**: If specified, all output is also written to this file
- **Log forwarding**: If specified, logs are forwarded to external system (future feature)
---
## Metrics Collection
Service metrics are collected on-demand via the `service.stats` RPC endpoint.
### Available Metrics
| `pid` | u32 | Always | Process ID of running service (0 if not running) |
| `memory_bytes` | u64 | Only if running | Memory usage in bytes from `/proc` stats |
| `cpu_percent` | f32 | Only if running | CPU usage percentage (currently returns 0.0, reserved for future) |
**Note**: Metrics are collected on-demand when requested, not continuously tracked.
---
## Minimal Service Example
The absolute minimum required to create a service:
```toml
[service]
name = "my-app"
exec = "/usr/bin/my-app"
```
This uses **all defaults** for everything else:
- `dir`: null (inherit working directory)
- `oneshot`: false
- `env`: {} (no custom env vars)
- `status`: "start" (supervisor keeps it running)
- `class`: "user"
- `critical`: false
- **Lifecycle defaults**: on_failure restart, 1-10 second backoff, 30s startup timeout, SIGTERM stop
- **No health checks**
- **Logging**: 1000 line buffer, no file persistence
---
## Full Service Example
A service with all fields explicitly specified:
```toml
[service]
name = "web-server"
exec = "/usr/bin/nginx -g 'daemon off;'"
dir = "/var/www"
oneshot = false
status = "start"
class = "user"
critical = false
[service.env]
PORT = "8080"
DEBUG = "false"
LOG_LEVEL = "info"
[dependencies]
after = ["network-ready", "filesystem"]
requires = ["kernel"]
wants = ["metrics", "syslog"]
conflicts = ["old-nginx", "apache"]
[lifecycle]
restart = "on_failure"
restart_delay_ms = 2000
restart_delay_max_ms = 300000
max_restarts = 10
stability_period_ms = 30000
start_timeout_ms = 60000
stop_timeout_ms = 30000
stop_signal = "SIGQUIT"
[health]
type = "http"
target = "http://localhost:8080/health"
expect_status = 200
interval_ms = 5000
timeout_ms = 2000
retries = 3
start_period_ms = 10000
[logging]
buffer_lines = 500
file = "/var/log/nginx.log"
```
---
## Service Creation Flow
When a service is created via `service.set` RPC endpoint:
### Validation Steps
1. **Parse configuration** - Convert JSON/TOML to ServiceConfig struct
2. **Validate fields** - Check required fields present, types valid
3. **Check executable** - Verify `exec` path exists and is executable
4. **Check dependencies** - Verify all referenced services exist (unless they're soft `wants`)
5. **Check for conflicts** - Ensure no conflicting services are currently running
### Creation Steps
1. If service with same name exists:
- Stop the existing service (hard stop, no graceful shutdown)
- Remove it from supervisor graph
2. Persist to disk:
- Write service config as TOML file to disk
- Location: Determined by `ZINIT_CONFIG_DIR` env var (default: `~/.config/zinit/services/`)
- Filename: `{service_name}.toml`
3. Register in memory:
- Add service to supervisor's in-memory service graph
- Initialize service state based on `status` field
4. Auto-start if needed:
- If `status` is `"start"` → immediately transition service to Starting state
- If `status` is `"stop"` → keep service in Inactive state
- If `status` is `"ignore"` → keep service in Inactive state
### Persistence
- **All services are persisted to disk** when created via `service.set`
- Services survive supervisor restart (reload from disk on startup)
- Configuration files are human-readable TOML
### Error Handling
If any validation fails, the entire operation is aborted:
- Service is not created
- Nothing is written to disk
- Error is returned to client with description
---
## Service State Machine
Once created, services follow this state machine:
```
Inactive
↓
Starting → Running → Stopping → Exited
↓ ↓ ↓ ↓
Failed ← ← ← Blocked ← ← ← ← ← ← ← ↓
```
- **Inactive**: Service not currently running
- **Blocked**: Dependencies not met or conflicts present
- **Starting**: Transitioning to Running, within `start_timeout_ms`
- **Running**: Service is executing
- **Stopping**: Graceful shutdown in progress, within `stop_timeout_ms`
- **Exited**: Service exited with success (exit code 0)
- **Failed**: Service exited with failure (non-zero exit code)
---
## Validation Rules
Services are validated according to these rules:
| Duplicate name | Rejected - service names must be unique |
| Missing required fields | Rejected - name and exec required |
| Nonexistent executable | Rejected - exec path must be valid and executable |
| Circular dependencies | Rejected - dependency graphs must be acyclic |
| Nonexistent dependencies | Partially allowed - `wants` can reference missing services, but `after` and `requires` cannot |
| Self-reference | Rejected - service cannot depend on itself |
| Unknown fields | Ignored - unknown TOML fields are skipped without error |
---
## Practical Examples
### Simple Web Server (HTTP)
```toml
[service]
name = "web"
exec = "/usr/bin/python3 -m http.server 8000"
status = "start"
[health]
type = "http"
target = "http://localhost:8000/"
```
Uses all other defaults. Web server starts immediately, restarts on failure with exponential backoff.
### Database Service (TCP Check)
```toml
[service]
name = "postgres"
exec = "/usr/lib/postgresql/bin/postgres -D /var/lib/postgresql/data"
status = "start"
[lifecycle]
start_timeout_ms = 60000
stop_timeout_ms = 30000
[health]
type = "tcp"
target = "localhost:5432"
start_period_ms = 5000
```
Longer startup timeout, TCP health check with 5-second grace period.
### One-Shot Initialization Task
```toml
[service]
name = "setup-db"
exec = "/usr/bin/db-migrate.sh"
oneshot = true
status = "start"
[lifecycle]
restart = "never"
start_timeout_ms = 300000
```
Runs once, no auto-restart, 5-minute timeout for long-running migration.
### System Service (Protected)
```toml
[service]
name = "sshd"
exec = "/usr/sbin/sshd -D"
class = "system"
status = "start"
[lifecycle]
max_restarts = 0
restart = "always"
```
System service won't be stopped by bulk operations, always restarts if it crashes.