# Configuration
Most users can run gflow without configuration. Use a config file (TOML) and/or environment variables when you need to change where the daemon listens, or restrict GPU usage.
## Config File
Default location:
```
~/.config/gflow/gflow.toml
```
Generate one interactively:
```bash
gflowd init
```
Minimal example:
```toml
[daemon]
host = "localhost"
port = 59000
# gpus = [0, 2]
# gpu_allocation_strategy = "sequential" # or "random"
```
All CLIs accept `--config <path>` to use a different file:
```bash
gflowd --config <path> up
ginfo --config <path>
gbatch --config <path> --gpus 1 python train.py
```
## Daemon Settings
### Host and Port
```toml
[daemon]
host = "localhost"
port = 59000
```
- Default: `localhost:59000`
- Use `0.0.0.0` only if you understand the security implications.
<a id="gpu-selection"></a>
#### GPU Selection
Restrict which physical GPUs the scheduler is allowed to allocate.
Config file:
```toml
[daemon]
gpus = [0, 2]
```
#### GPU Allocation Strategy
Control how gflow picks GPU indices when multiple GPUs are available.
Config file:
```toml
[daemon]
gpu_allocation_strategy = "sequential" # default
# gpu_allocation_strategy = "random"
```
- `sequential`: deterministic, prefer lower GPU indices first.
- `random`: randomize GPU selection order each scheduling cycle.
Daemon CLI flag (overrides config):
```bash
gflowd up --gpu-allocation-strategy random
gflowd restart --gpu-allocation-strategy sequential
```
Daemon CLI flag (overrides config):
```bash
gflowd up --gpus 0,2
gflowd restart --gpus 0-3
```
Runtime control (affects new allocations only):
```bash
gctl set-gpus 0,2
gctl set-gpus all
gctl show-gpus
```
Supported specs: `0`, `0,2,4`, `0-3`, `0-1,3,5-6`.
Precedence (highest → lowest):
1. CLI flag (`gflowd up --gpus ...`)
2. Env var (`GFLOW_DAEMON_GPUS=...`)
3. Config file (`daemon.gpus = [...]`)
4. Default: all detected GPUs
For allocation strategy:
1. CLI flag (`gflowd up --gpu-allocation-strategy ...`)
2. Env var (`GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=...`)
3. Config file (`daemon.gpu_allocation_strategy = "..."`)
4. Default: `sequential`
## Timezone
Configure timezone for displaying and parsing reservation times.
Config file:
```toml
timezone = "Asia/Shanghai"
```
Per-command override:
```bash
gctl reserve create --user alice --gpus 2 --start "2026-02-01 14:00" --duration "2h" --timezone "UTC"
```
Supported formats:
- IANA timezone names: `"Asia/Shanghai"`, `"America/Los_Angeles"`, `"UTC"`
- Time input: ISO8601 (`"2026-02-01T14:00:00Z"`) or simple format (`"2026-02-01 14:00"`)
Precedence (highest → lowest):
1. CLI flag (`--timezone`)
2. Config file (`timezone = "..."`)
3. Default: local system timezone
## Project Tracking
Use project settings to standardize job ownership metadata across teams.
```toml
[projects]
known_projects = ["ml-research", "cv-team"]
require_project = false
```
- `known_projects`: allowed project codes. Empty means any non-empty code is allowed.
- `require_project`: when `true`, every submitted job must include a non-empty project.
- Project values are normalized (trimmed). Whitespace-only values are treated as unset.
- Project code length limit: 64 characters.
- If both settings are used, project must be present and in `known_projects`.
Related CLI usage:
```bash
gbatch --project ml-research python train.py
gqueue --project ml-research
gqueue --format JOBID,NAME,PROJECT,ST,TIME
```
## Notifications (Webhooks)
gflowd can send HTTP POST webhooks for job and system events (best-effort).
Enable and configure:
```toml
[notifications]
enabled = true
max_concurrent_deliveries = 16
[[notifications.webhooks]]
url = "https://api.example.com/gflow/events"
events = ["job_completed", "job_failed", "job_timeout"] # or ["*"]
filter_users = ["alice", "bob"] # optional
headers = { Authorization = "Bearer token123" } # optional
timeout_secs = 10
max_retries = 3
```
Supported event names:
- `job_submitted`
- `job_started`
- `job_completed`
- `job_failed`
- `job_cancelled`
- `job_timeout`
- `job_held`
- `job_released`
- `gpu_available` (only when a GPU becomes available)
- `reservation_created`
- `reservation_cancelled`
Payload shape (fields may be omitted depending on event):
```json
{
"event": "job_completed",
"timestamp": "2026-02-04T12:30:45Z",
"job": { "id": 42, "user": "alice", "state": "Finished" },
"scheduler": { "host": "gpu-server-01", "version": "0.4.11" }
}
```
Notes:
- `events = ["*"]` subscribes to all supported events.
- Use `filter_users` to restrict notifications by job submitter / reservation owner.
- `max_retries` uses exponential backoff (best-effort); deliveries may be skipped if the daemon is overloaded.
- Be careful with sensitive data: webhooks can include job metadata and usernames.
### Logging
- `gflowd`: use `-v/--verbose` (see `gflowd --help`).
- Client commands (`gbatch`, `gqueue`, `ginfo`, `gjob`, `gctl`): use `RUST_LOG` (e.g. `RUST_LOG=info`).
## Environment Variables
```bash
export GFLOW_DAEMON_HOST=localhost
export GFLOW_DAEMON_PORT=59000
export GFLOW_DAEMON_GPUS=0,2
export GFLOW_DAEMON_GPU_ALLOCATION_STRATEGY=random
```
## Files and State
gflow follows the XDG Base Directory spec:
```text
~/.config/gflow/gflow.toml
~/.local/share/gflow/state.msgpack (or state.json for legacy)
~/.local/share/gflow/logs/<job_id>.log
```
### State Persistence Format
Starting from version 0.4.11, gflowd uses **MessagePack** binary format for state persistence:
- **New installations**: State is saved to `state.msgpack` (binary format)
- **Automatic migration**: Existing `state.json` files are automatically migrated to `state.msgpack` on first load
- **Backward compatibility**: gflowd can still read old `state.json` files
### Recovery mode (state file issues)
If the state file cannot be deserialized or migrated (e.g. after upgrading/downgrading versions), `gflowd` enters **recovery mode**:
- `gflowd` continues running, but does not overwrite the state file.
- State changes are persisted to a single-snapshot journal file: `~/.local/share/gflow/state.journal.jsonl` (it is overwritten on each save).
- `/health` returns `200` with `status: "recovery"` and `mode: "journal"`.
- A backup copy is created next to the state file (e.g. `state.msgpack.backup.<timestamp>` or `state.msgpack.corrupt.<timestamp>`).
When the state file becomes readable again, `gflowd` loads the latest journal snapshot, rewrites the state file, and truncates the journal.
If the journal file is not writable, `gflowd` falls back to **read-only** mode and mutating APIs return `503`.
To recover, upgrade/downgrade to a version that can read/migrate your state, or restore from the backup file.
## Troubleshooting
### Config file not found
```bash
ls -la ~/.config/gflow/gflow.toml
```
### Port already in use
Change the port:
```toml
[daemon]
port = 59001
```
## See Also
- [Installation](../getting-started/installation) - Initial setup
- [Quick Start](../getting-started/quick-start) - Basic usage
- [GPU Management](./gpu-management) - GPU allocation