kanade-backend 0.4.0

axum + SQLite projection backend for the kanade endpoint-management system. Hosts /api/* and the embedded SPA dashboard, projects JetStream streams into SQLite, drives the cron scheduler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/yukimemi/kanade/main/assets/logo-dark.svg">
  <img src="https://raw.githubusercontent.com/yukimemi/kanade/main/assets/logo.svg" alt="kanade — orchestrate fleets of Windows endpoints" width="540">
</picture>

[![CI](https://github.com/yukimemi/kanade/actions/workflows/ci.yml/badge.svg)](https://github.com/yukimemi/kanade/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/yukimemi/kanade/graph/badge.svg)](https://codecov.io/gh/yukimemi/kanade)
[![crates.io](https://img.shields.io/crates/v/kanade.svg)](https://crates.io/crates/kanade)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/yukimemi/kanade/blob/main/LICENSE)

> 奏 — *orchestrate*. A self-hosted Rust pub/sub backbone for managing
> thousands of Windows endpoints without Active Directory. NATS / JetStream
> carries inventory polling, fleet-wide rollouts, and ad-hoc emergency
> commands on a single channel.

**Status: 0.1.0 — Sprint 4 shipped.** Agent + backend (axum + SQLite
projector + JetStream KV watcher + cron scheduler) + admin CLI + an
embedded SPA dashboard + JWT-gated `/api/*` + agent self-update via the
JetStream Object Store. Full design lives in
[docs/SPEC.md](https://github.com/yukimemi/kanade/blob/main/docs/SPEC.md) (Japanese, ~1150 lines covering Part 1
overview and Part 2 detailed design).

## Why

The off-the-shelf endpoint managers (Intune, Tanium, Workspace ONE, …)
either require Active Directory, lock you into a vendor cloud, or both.
For shops that want AD-independent, on-prem, scriptable fleet control
the answer has historically been "build something on top of a message
broker" — which everyone reinvents from scratch.

`kanade` aims to be the reusable shape of that build:

- **NATS + JetStream as the only moving part.** Agents speak to the
  broker over outbound TLS; the broker fans out commands, fans in
  inventory and results. No AD, no client-pull-from-server, no opening
  inbound ports on user PCs.
- **Declarative job manifests in Git.** Review, history, rollback all
  come for free; the YAML schema (`jobs/*.yaml`) is the same input
  whether you `kanade deploy` ad-hoc or wire it onto a cron `kanade
  schedule`.
- **Three layers of stop-the-bleed.** Stream max-msgs-per-subject
  replaces stale rollouts in the broker; consumer-side version checks
  guard execution; `kanade kill <job_id>` terminates running children.
  The emergency-stop path is wired from MVP, not bolted on later (see
  [SPEC.md §2.6]https://github.com/yukimemi/kanade/blob/main/docs/SPEC.md).
- **Phased build-out.** One server is enough for a few hundred
  endpoints; the same code scales to a 3-node NATS cluster + replicated
  backend + Postgres for several thousand.

## Crates

| crate            | kind | role |
|------------------|------|------|
| `kanade-shared`  | lib  | wire types (`Command` / `ExecResult` / `Heartbeat` / `HwInventory`), NATS subject + KV helpers, YAML manifest schema, [teravars]-backed config loader |
| `kanade-agent`   | bin  | Windows-side resident daemon: subscribes to `commands.*`, runs child processes, publishes results + heartbeats + WMI inventory; watches the layered `agent_config` + `agent_groups` KV buckets and reacts live to cadence / membership / target_version changes |
| `kanade-backend` | bin  | axum HTTP server: `/health`, `/api/{agents,results,audit,deploy,schedules,config,…}`, embedded SPA at `/`. Auto-bootstraps every required JetStream resource at startup, runs durable projectors (INVENTORY/RESULTS/AUDIT → SQLite) and a `tokio-cron-scheduler` driven by the schedules KV |
| `kanade`         | bin  | operator-side admin CLI (`kubectl`-style single entry point); subcommands talk to NATS directly for `run`/`ping`/`kill`/`revoke`/`jetstream`/`agent`/`config` and to the backend over HTTP for `deploy`/`schedule` |

## Install

You'll need:

- Rust 1.85+ (the workspace pins `edition = "2024"`)
- A NATS server (Go binary, ~15 MB)

```powershell
# 1. NATS server
scoop install nats-server         # or: winget install nats-io.nats-server

# 2. The three kanade binaries — straight from crates.io.
cargo install kanade kanade-agent kanade-backend
```

`kanade`, `kanade-agent`, and `kanade-backend` are now on your PATH
(under `~/.cargo/bin/`).

You'll also want the sample configs (`agent.toml` / `backend.toml`) and
the example manifests (`jobs/*.yaml`). The fastest way is a shallow
clone of this repo:

```powershell
git clone --depth=1 https://github.com/yukimemi/kanade.git
cd kanade
```

(or `curl` the individual files from
`https://raw.githubusercontent.com/yukimemi/kanade/main/...` into your
own working dir if you'd rather not clone).

> **Build it yourself from source.** Skip the `cargo install` step,
> `git clone` the full repo, and run `cargo install --path crates/kanade
> --path crates/kanade-agent --path crates/kanade-backend` (one
> `--path` at a time, or repeat the command three times). That path
> matters if you're hacking on the crates.

## Quick start (5 terminals, ~2 minutes)

Run each step in its own PowerShell window so the daemons stay up. All
of them assume `cd` into the directory that holds `agent.toml` /
`backend.toml` / `jobs/`, which is the repo root if you cloned it.

### 1 — start NATS

```powershell
nats-server -js -p 4222
```

### 2 — provision JetStream (optional)

```powershell
kanade jetstream setup
```

Creates every stream (`INVENTORY` / `RESULTS` / `DEPLOY` / `EVENTS` /
`AUDIT`), KV bucket (`script_current` / `script_status` / `agents_state`
/ `agent_config` / `agent_groups` / `schedules`), and the
`agent_releases` Object Store. This step is **optional** as of v0.3.1:
`kanade-backend` auto-bootstraps the same set at startup, so a fresh
NATS server + `kanade-backend` is enough to get a working fleet. The
CLI command is still useful for re-running setup against a different
broker, or for inspecting what would be created (`kanade jetstream
status`).

### 3 — start the backend

```powershell
$env:KANADE_AUTH_DISABLE = "1"   # JWT off for development
kanade-backend
```

Serves the dashboard at <http://127.0.0.1:8080> and the JSON API at
`/api/*`. SQLite is created at `./backend.db`. Both projectors and the
cron scheduler start in the background.

### 4 — start the agent

```powershell
kanade-agent
```

Loads `./agent.toml`, picks `$env:COMPUTERNAME` as `pc_id`, subscribes
to `commands.all` + `commands.pc.{pc_id}`, then spawns the
config_supervisor (watches `agent_config` + `agent_groups` KV) plus
the heartbeat / inventory / self-update / groups-manager loops. Group
membership and cadence settings are read from the KV buckets — see
`kanade agent groups` and `kanade config` to drive them.

### 5 — drive it

```powershell
# Round-trip a script via NATS, request/reply.
kanade run $env:COMPUTERNAME -- 'echo hello from kanade'

# Or via the backend's YAML deploy path (writes a row to deployments,
# emits an audit event, broadcasts the Command).
kanade deploy jobs/echo-test.yaml

# Heartbeat probe.
kanade ping $env:COMPUTERNAME

# Inspect via curl…
curl http://127.0.0.1:8080/api/agents
curl http://127.0.0.1:8080/api/results
curl http://127.0.0.1:8080/api/audit

# …or open the dashboard.
start http://127.0.0.1:8080
```

## CLI cheat sheet

```text
kanade run    <pc_id> -- <script>                # request/reply via NATS
kanade ping   <pc_id>                            # wait for one heartbeat
kanade kill   <job_id>                           # publish kill.{job_id}
kanade revoke <cmd_id>                           # script_status = REVOKED
kanade unrevoke <cmd_id>                         # → ACTIVE

kanade jetstream setup                           # create streams + KV + Object Store (optional; backend auto-bootstraps on startup)
kanade jetstream status                          # health snapshot

kanade deploy   <manifest.yaml> [--version <v>]  # POST /api/deploy
kanade schedule create <schedule.yaml>           # POST /api/schedules (cron + manifest)
kanade schedule list
kanade schedule delete <id>

kanade agent publish <binary> --version <v>      # upload to Object Store + flip global.target_version
kanade agent current                             # read agent_config.global.target_version

kanade agent groups list <pc_id>                 # current group memberships for one PC
kanade agent groups add  <pc_id> <group>         # add membership (idempotent)
kanade agent groups rm   <pc_id> <group>         # drop membership
kanade agent groups set  <pc_id> <group> ...     # replace whole list

kanade config get  [--group <name>|--pc <pc_id>] # ConfigScope at this scope (default: global)
kanade config set  <field>=<value> [...]         # set one field (target_version / inventory_* / heartbeat_*)
kanade config unset <field> [...]                # clear one field
kanade config clear [--group <name>|--pc <pc_id>] # delete the whole scope row
kanade config effective <pc_id>                  # resolved view for a PC (built-in -> global -> groups -> pc)
```

`kanade <subcommand> --help` for argument details.

## Authoring jobs

YAML manifests in `jobs/*.yaml` (see [spec §2.4.1](https://github.com/yukimemi/kanade/blob/main/docs/SPEC.md)).
Sample manifests in the repo cover:

- `jobs/echo-test.yaml` — minimal ad-hoc command
- `jobs/wave-test.yaml``rollout.waves` rollout (canary → wave1 with delay)
- `jobs/schedule-test.yaml` — cron-driven echo every 10 s

A wave manifest sketch:

```yaml
id: cleanup-disk-temp
version: 1.0.1
target:
  pcs: [PC1234]
execute:
  shell: powershell
  script: |
    $temp = [System.IO.Path]::GetTempPath()
    Remove-Item "$temp\*" -Recurse -Force -ErrorAction SilentlyContinue
  timeout: 600s
  jitter: 5m
rollout:
  strategy: wave
  waves:
    - { group: canary, delay: 0s  }
    - { group: wave1,  delay: 30m }
```

## Config files

Both use [teravars] templating — `{{ system.host }}`, `{{ env(name="X", default="Y") }}`, `{% if is_windows() %}…{% endif %}` are all available.

`agent.toml` (intentionally minimal — fleet policy lives in the
`agent_config` + `agent_groups` KV buckets, edited via
`kanade config` / `kanade agent groups`):

```toml
[agent]
id = '{{ system.host }}'
nats_url = 'nats://127.0.0.1:4222'

[log]
path = 'logs/agent.log'
level = 'info'
```

Older agent.toml files that still carry `[agent] groups = […]` or an
`[inventory]` section keep loading — both fields are parsed via
`#[serde(default)]` — but the values are logged-and-ignored at
startup. Removal is scheduled for v0.4.0.

`backend.toml`:

```toml
[server]
bind = '0.0.0.0:8080'

[nats]
url = 'nats://127.0.0.1:4222'

[db]
sqlite_path = './backend.db'

[log]
path = 'logs/backend.log'
level = 'info'
```

## Authentication

`/api/*` is protected by a single middleware (`crates/kanade-backend/src/auth.rs`).
Three modes, picked by env var on the backend side:

| Env on `kanade-backend` | Mode | Use for |
|---|---|---|
| `KANADE_AUTH_DISABLE=1` | open | local dev, `cargo run` |
| `KANADE_AUTH_STATIC_TOKEN=<secret>` | shared bearer | single-operator fleets — paste the same secret on the SPA login + `kanade` CLI |
| `KANADE_JWT_SECRET=<secret>` | HS256 JWT | full multi-user setup; sign tokens out-of-band with `aud=kanade` |

Precedence: `DISABLE` > `STATIC_TOKEN` > `JWT_SECRET`. Backend with none of
the three set falls back to a hard-coded dev secret and logs a loud warning —
fine for one-shot debugging, **never** for production.

Clients send `Authorization: Bearer <token>` on every `/api/*` request:

- **SPA**: stores the token in `localStorage`; click `login` in the top-right
  nav to paste, `logout` to clear. A 401 from the backend auto-clears the
  stored token and re-prompts.
- **CLI**: reads `$env:KANADE_AUTH_TOKEN`. Set it once per shell session
  (or export it from a shell profile). The CLI sends the same header
  regardless of which auth mode the backend is running.

```powershell
# Backend side
$env:KANADE_AUTH_STATIC_TOKEN = "kanade-fleet-secret-2026"
.\deploy-backend.ps1

# Operator side (CLI)
$env:KANADE_AUTH_TOKEN = "kanade-fleet-secret-2026"
kanade deploy jobs\echo-test.yaml
```

## Dev workflow

```powershell
cargo make check       # fmt-check + clippy + test + lock-check (same as CI)
cargo make fmt         # apply formatting
cargo make on-add      # renri post_create hook (apm install + vcs fetch)
```

The workspace pins `[profile.dev] debug = "line-tables-only"` because
Windows MSVC `link.exe` hits `LNK1318` (PDB record limit) once axum +
sqlx + reqwest + tokio-cron-scheduler + jsonwebtoken all sit in one
workspace; line-tables-only keeps backtraces useful without exploding
the PDB.

## Sprint history

- **Sprint 1** — workspace scaffolding, NATS plumbing, agent + CLI echo round-trip
- **Sprint 2** — §2.6 kill switch (subscribe + flush race fix), version-pin KV, WMI HW inventory
- **Sprint 3** — backend skeleton, SQLite projectors, YAML deploy API, audit log, `tokio-cron-scheduler` with dynamic KV watch
- **Sprint 4** — wave rollout + agent-side jitter, embedded SPA dashboard, HS256 JWT middleware, agent self-update via the JetStream Object Store (atomic exe swap + SCM failure-action restart in v0.1.5)
- **Sprint 5** (v0.2.0) — server-managed group membership: `agent_groups` KV bucket, dynamic agent-side subscribe/unsubscribe, admin API + `kanade agent groups` CLI. `[agent] groups` field in agent.toml deprecated
- **Sprint 6** (v0.3.0) — layered `agent_config` KV bucket: `ConfigScope` per global / per-group / per-pc, resolver with deterministic precedence + multi-group conflict warnings, dynamic cadence reconciliation for heartbeat / inventory / self_update, admin API + `kanade config` CLI. `[inventory]` section in agent.toml deprecated
- **v0.3.1**`kanade-backend` auto-bootstraps every JetStream resource at startup; the operator-side `kanade jetstream setup` is now optional

Backlog: Prometheus metrics, 3000-agent simulation, NATS cluster + replicated backend, Postgres migration.

## Production install layout

`cargo install` drops the binaries under `~/.cargo/bin/` (user-local).
For a real deployment, copy them into the spec §2.11 layout and register
a service so they survive reboots.

### Path layout

```text
Windows                                    Linux
C:\Program Files\Kanade\                   /usr/local/bin/
  ├── kanade-agent.exe                       ├── kanade-agent
  ├── kanade-backend.exe                     ├── kanade-backend
  ├── kanade.exe                             ├── kanade
  └── nats-server.exe                        └── nats-server

C:\ProgramData\Kanade\config\              /etc/kanade/
  ├── agent.toml                             ├── agent.toml
  └── backend.toml                           └── backend.toml

C:\ProgramData\Kanade\data\                /var/lib/kanade/
  ├── state.db        (agent)                ├── state.db
  ├── outbox\         (agent)                ├── outbox/
  ├── staging\        (self-update)          ├── staging/
  ├── backend.db      (backend)              ├── backend.db
  ├── certs\                                 ├── certs/
  └── nats\           (JetStream data)       └── nats/

C:\ProgramData\Kanade\logs\                /var/log/kanade/
  ├── agent.log                              ├── agent.log
  ├── backend.log                            ├── backend.log
  └── nats-server.log                        └── nats-server.log
```

### Config discovery

Every binary looks up its config file in this exact order (no cwd
fallback — too easy to load the wrong file by accident):

1. `--config <path>` CLI flag (always honored, even if the file
   doesn't exist — that's the caller's choice).
2. Environment variable: `KANADE_AGENT_CONFIG` for `kanade-agent`,
   `KANADE_BACKEND_CONFIG` for `kanade-backend`. Non-empty value
   wins.
3. `<config_dir>/<basename>`:
   - Windows: `%ProgramData%\Kanade\config\agent.toml`
   - Linux: `/etc/kanade/agent.toml`

If none of the three is reachable, the binary exits with a message
listing every option an operator can use to fix it.

### Install scripts (Windows, recommended)

For hosts without `cargo` installed (the common case for agents and
production backends), use the PowerShell deploy scripts under
[`scripts/`](https://github.com/yukimemi/kanade/blob/main/scripts/).
The flow is "drop exe + config + script into one folder, run as
Admin": the script lays out the directory tree, copies the binary
into `%ProgramFiles%\Kanade\`, seeds the config into
`%ProgramData%\Kanade\config\` (without clobbering an existing
edited one), and registers the Windows service.

```powershell
# 1. On the build host: grab the release binaries + sample configs.
#    Either from a GitHub Release zip, or from a `cargo build --release`
#    output, or by `cargo install --root .\stage kanade-agent`.

# 2. Stage one folder per role with the matching files:
#    .\stage-agent\
#      ├── deploy-agent.ps1     (from scripts\ in this repo)
#      ├── kanade-agent.exe
#      └── agent.toml           (edit before deploy)
#
#    .\stage-backend\
#      ├── deploy-backend.ps1
#      ├── kanade-backend.exe
#      └── backend.toml

# 3. Copy each stage folder onto the target host (xcopy, robocopy,
#    scp, USB stick — whatever fits your environment).

# 4. On the target host, run the matching script as Administrator:
PS> .\deploy-agent.ps1
PS> .\deploy-backend.ps1 -FirewallPort 8443    # match bind_addr in backend.toml
```

Re-running the script upgrades the binary in place and preserves
the edited config. Pass `-ForceConfig` to overwrite the installed
config from the source folder, or `-NoStart` to skip the
post-install service start.

### Windows Service registration (sc.exe)

If you'd rather not use the deploy scripts (or want to understand
exactly what they do), here are the equivalent manual commands:

```powershell
# Stage the binaries
New-Item -ItemType Directory -Force 'C:\Program Files\Kanade'
Copy-Item "$env:USERPROFILE\.cargo\bin\kanade-agent.exe"   'C:\Program Files\Kanade\'
Copy-Item "$env:USERPROFILE\.cargo\bin\kanade-backend.exe" 'C:\Program Files\Kanade\'

# Stage the config (review + edit first)
New-Item -ItemType Directory -Force 'C:\ProgramData\Kanade\config'
Copy-Item .\agent.toml   'C:\ProgramData\Kanade\config\'
Copy-Item .\backend.toml 'C:\ProgramData\Kanade\config\'

# Register the agent as a service running under LocalSystem.
sc.exe create KanadeAgent `
  binPath= '"C:\Program Files\Kanade\kanade-agent.exe"' `
  start= auto `
  obj= LocalSystem `
  DisplayName= "Kanade Endpoint Agent"
sc.exe failure KanadeAgent reset= 86400 actions= restart/60000/restart/60000/restart/60000

# Register the backend the same way.
sc.exe create KanadeBackend `
  binPath= '"C:\Program Files\Kanade\kanade-backend.exe"' `
  start= auto `
  obj= LocalSystem `
  DisplayName= "Kanade Backend"

sc.exe start KanadeAgent
sc.exe start KanadeBackend
```

### Linux systemd units

```ini
# /etc/systemd/system/kanade-backend.service
[Unit]
Description=Kanade Backend
After=network.target nats.service

[Service]
ExecStart=/usr/local/bin/kanade-backend
Restart=always
User=kanade
Environment=RUST_LOG=info

[Install]
WantedBy=multi-user.target
```

```bash
sudo systemctl daemon-reload
sudo systemctl enable --now kanade-backend.service
```

The agent unit is symmetric (`kanade-agent.service`, `ExecStart=/usr/local/bin/kanade-agent`).

## Scaffolded with kata

The skeleton (`AGENTS.md` / `Makefile.toml` / `clippy.toml` /
`rustfmt.toml` / `.github/workflows/*` / etc.) was applied via
[`github.com/yukimemi/pj-presets:rust-cli`](https://github.com/yukimemi/pj-presets)
through `kata init`. The Cargo workspace layout under `crates/` is
hand-written because the preset is single-crate by default; a
`pj-rust-workspace` layer is on the future TODO once the multi-crate
patterns stabilise.

## License

MIT — see [LICENSE](https://github.com/yukimemi/kanade/blob/main/LICENSE).

[teravars]: https://github.com/yukimemi/teravars