kanade-backend 0.5.1

kanade-backend-0.5.1 is not a library.

奏 — orchestrate. A self-hosted Rust pub/sub backbone for managing thousands of Windows endpoints without Active Directory. NATS / JetStream carries inventory polling, fleet-wide rollouts, and ad-hoc emergency commands on a single channel.

Status: 0.1.0 — Sprint 4 shipped. Agent + backend (axum + SQLite projector + JetStream KV watcher + cron scheduler) + admin CLI + an embedded SPA dashboard + JWT-gated /api/* + agent self-update via the JetStream Object Store. Full design lives in docs/SPEC.md (Japanese, ~1150 lines covering Part 1 overview and Part 2 detailed design).

Why

The off-the-shelf endpoint managers (Intune, Tanium, Workspace ONE, …) either require Active Directory, lock you into a vendor cloud, or both. For shops that want AD-independent, on-prem, scriptable fleet control the answer has historically been "build something on top of a message broker" — which everyone reinvents from scratch.

kanade aims to be the reusable shape of that build:

NATS + JetStream as the only moving part. Agents speak to the broker over outbound TLS; the broker fans out commands, fans in inventory and results. No AD, no client-pull-from-server, no opening inbound ports on user PCs.
Declarative job manifests in Git. Review, history, rollback all come for free; the YAML schema (jobs/*.yaml) is the same input whether you kanade deploy ad-hoc or wire it onto a cron kanade schedule.
Three layers of stop-the-bleed. Stream max-msgs-per-subject replaces stale rollouts in the broker; consumer-side version checks guard execution; kanade kill <job_id> terminates running children. The emergency-stop path is wired from MVP, not bolted on later (see SPEC.md §2.6).
Phased build-out. One server is enough for a few hundred endpoints; the same code scales to a 3-node NATS cluster + replicated backend + Postgres for several thousand.

Crates

crate	kind	role
`kanade-shared`	lib	wire types (`Command` / `ExecResult` / `Heartbeat` / `HwInventory`), NATS subject + KV helpers, YAML manifest schema, teravars-backed config loader
`kanade-agent`	bin	Windows-side resident daemon: subscribes to `commands.*`, runs child processes, publishes results + heartbeats + WMI inventory; watches the layered `agent_config` + `agent_groups` KV buckets and reacts live to cadence / membership / target_version changes
`kanade-backend`	bin	axum HTTP server: `/health`, `/api/{agents,results,audit,deploy,schedules,config,…}`, embedded SPA at `/`. Auto-bootstraps every required JetStream resource at startup, runs durable projectors (INVENTORY/RESULTS/AUDIT → SQLite) and a `tokio-cron-scheduler` driven by the schedules KV
`kanade`	bin	operator-side admin CLI (`kubectl`-style single entry point); subcommands talk to NATS directly for `run`/`ping`/`kill`/`revoke`/`jetstream`/`agent`/`config` and to the backend over HTTP for `deploy`/`schedule`

Install

You'll need:

Rust 1.85+ (the workspace pins edition = "2024")
A NATS server (Go binary, ~15 MB)

# 1. NATS server
scoop install nats-server         # or: winget install nats-io.nats-server

# 2. The three kanade binaries — straight from crates.io.
cargo install kanade kanade-agent kanade-backend

kanade, kanade-agent, and kanade-backend are now on your PATH (under ~/.cargo/bin/).

You'll also want the sample configs (agent.toml / backend.toml) and the example manifests (jobs/*.yaml). The fastest way is a shallow clone of this repo:

git clone --depth=1 https://github.com/yukimemi/kanade.git
cd kanade

(or curl the individual files from https://raw.githubusercontent.com/yukimemi/kanade/main/... into your own working dir if you'd rather not clone).

Build it yourself from source. Skip the cargo install step, git clone the full repo, and run cargo install --path crates/kanade --path crates/kanade-agent --path crates/kanade-backend (one --path at a time, or repeat the command three times). That path matters if you're hacking on the crates.

Quick start (5 terminals, ~2 minutes)

Run each step in its own PowerShell window so the daemons stay up. All of them assume cd into the directory that holds agent.toml / backend.toml / jobs/, which is the repo root if you cloned it.

1 — start NATS

nats-server -js -p 4222

2 — provision JetStream (optional)

kanade jetstream setup

Creates every stream (INVENTORY / RESULTS / DEPLOY / EVENTS / AUDIT), KV bucket (script_current / script_status / agents_state / agent_config / agent_groups / schedules), and the agent_releases Object Store. This step is optional as of v0.3.1: kanade-backend auto-bootstraps the same set at startup, so a fresh NATS server + kanade-backend is enough to get a working fleet. The CLI command is still useful for re-running setup against a different broker, or for inspecting what would be created (kanade jetstream status).

3 — start the backend

$env:KANADE_AUTH_DISABLE = "1"   # JWT off for development
kanade-backend

Serves the dashboard at http://127.0.0.1:8080 and the JSON API at /api/*. SQLite is created at ./backend.db. Both projectors and the cron scheduler start in the background.

4 — start the agent

kanade-agent

Loads ./agent.toml, picks $env:COMPUTERNAME as pc_id, subscribes to commands.all + commands.pc.{pc_id}, then spawns the config_supervisor (watches agent_config + agent_groups KV) plus the heartbeat / inventory / self-update / groups-manager loops. Group membership and cadence settings are read from the KV buckets — see kanade agent groups and kanade config to drive them.

5 — drive it

# Round-trip a script via NATS, request/reply.
kanade run $env:COMPUTERNAME -- 'echo hello from kanade'

# Or via the backend's YAML deploy path (writes a row to deployments,
# emits an audit event, broadcasts the Command).
kanade deploy jobs/echo-test.yaml

# Heartbeat probe.
kanade ping $env:COMPUTERNAME

# Inspect via curl…
curl http://127.0.0.1:8080/api/agents
curl http://127.0.0.1:8080/api/results
curl http://127.0.0.1:8080/api/audit

# …or open the dashboard.
start http://127.0.0.1:8080

CLI cheat sheet

kanade run    <pc_id> -- <script>                # request/reply via NATS
kanade ping   <pc_id>                            # wait for one heartbeat
kanade kill   <job_id>                           # publish kill.{job_id}
kanade revoke <cmd_id>                           # script_status = REVOKED
kanade unrevoke <cmd_id>                         # → ACTIVE

kanade jetstream setup                           # create streams + KV + Object Store (optional; backend auto-bootstraps on startup)
kanade jetstream status                          # health snapshot

kanade deploy   <manifest.yaml> [--version <v>]  # POST /api/deploy
kanade schedule create <schedule.yaml>           # POST /api/schedules (cron + manifest)
kanade schedule list
kanade schedule delete <id>

kanade agent publish <binary> --version <v>      # upload to Object Store + flip global.target_version
kanade agent current                             # read agent_config.global.target_version

kanade agent groups list <pc_id>                 # current group memberships for one PC
kanade agent groups add  <pc_id> <group>         # add membership (idempotent)
kanade agent groups rm   <pc_id> <group>         # drop membership
kanade agent groups set  <pc_id> <group> ...     # replace whole list

kanade config get  [--group <name>|--pc <pc_id>] # ConfigScope at this scope (default: global)
kanade config set  <field>=<value> [...]         # set one field (target_version / inventory_* / heartbeat_*)
kanade config unset <field> [...]                # clear one field
kanade config clear [--group <name>|--pc <pc_id>] # delete the whole scope row
kanade config effective <pc_id>                  # resolved view for a PC (built-in -> global -> groups -> pc)

kanade <subcommand> --help for argument details.

Authoring jobs

YAML manifests in jobs/*.yaml (see spec §2.4.1). Sample manifests in the repo cover:

jobs/echo-test.yaml — minimal ad-hoc command
jobs/wave-test.yaml — rollout.waves rollout (canary → wave1 with delay)
jobs/schedule-test.yaml — cron-driven echo every 10 s

A wave manifest sketch:

id: cleanup-disk-temp
version: 1.0.1
target:
  pcs: [PC1234]
execute:
  shell: powershell
  script: |
    $temp = [System.IO.Path]::GetTempPath()
    Remove-Item "$temp\*" -Recurse -Force -ErrorAction SilentlyContinue
  timeout: 600s
  jitter: 5m
rollout:
  strategy: wave
  waves:
    - { group: canary, delay: 0s  }
    - { group: wave1,  delay: 30m }

Config files

Both use teravars templating — {{ system.host }}, {{ env(name="X", default="Y") }}, {% if is_windows() %}…{% endif %} are all available.

agent.toml (intentionally minimal — fleet policy lives in the agent_config + agent_groups KV buckets, edited via kanade config / kanade agent groups):

[agent]
id = '{{ system.host }}'
nats_url = 'nats://127.0.0.1:4222'

[log]
path = 'logs/agent.log'
level = 'info'

Older agent.toml files that still carry [agent] groups = […] or an [inventory] section keep loading — both fields are parsed via #[serde(default)] — but the values are logged-and-ignored at startup. Removal is scheduled for v0.4.0.

backend.toml:

[server]
bind = '0.0.0.0:8080'

[nats]
url = 'nats://127.0.0.1:4222'

[db]
sqlite_path = './backend.db'

[log]
path = 'logs/backend.log'
level = 'info'

Authentication

/api/* is protected by a single middleware (crates/kanade-backend/src/auth.rs). Three modes, picked by env var on the backend side:

Env on `kanade-backend`	Mode	Use for
`KANADE_AUTH_DISABLE=1`	open	local dev, `cargo run`
`KANADE_AUTH_STATIC_TOKEN=<secret>`	shared bearer	single-operator fleets — paste the same secret on the SPA login + `kanade` CLI
`KANADE_JWT_SECRET=<secret>`	HS256 JWT	full multi-user setup; sign tokens out-of-band with `aud=kanade`

Precedence: DISABLE > STATIC_TOKEN > JWT_SECRET. Backend with none of the three set falls back to a hard-coded dev secret and logs a loud warning — fine for one-shot debugging, never for production.

Clients send Authorization: Bearer <token> on every /api/* request:

SPA: stores the token in localStorage; click login in the top-right nav to paste, logout to clear. A 401 from the backend auto-clears the stored token and re-prompts.
CLI: reads $env:KANADE_AUTH_TOKEN. Set it once per shell session (or export it from a shell profile). The CLI sends the same header regardless of which auth mode the backend is running.

# Backend side
$env:KANADE_AUTH_STATIC_TOKEN = "kanade-fleet-secret-2026"
.\deploy-backend.ps1

# Operator side (CLI)
$env:KANADE_AUTH_TOKEN = "kanade-fleet-secret-2026"
kanade deploy jobs\echo-test.yaml

Dev workflow

cargo make check       # fmt-check + clippy + test + lock-check (same as CI)
cargo make fmt         # apply formatting
cargo make on-add      # renri post_create hook (apm install + vcs fetch)

The workspace pins [profile.dev] debug = "line-tables-only" because Windows MSVC link.exe hits LNK1318 (PDB record limit) once axum + sqlx + reqwest + tokio-cron-scheduler + jsonwebtoken all sit in one workspace; line-tables-only keeps backtraces useful without exploding the PDB.

Sprint history

Sprint 1 — workspace scaffolding, NATS plumbing, agent + CLI echo round-trip
Sprint 2 — §2.6 kill switch (subscribe + flush race fix), version-pin KV, WMI HW inventory
Sprint 3 — backend skeleton, SQLite projectors, YAML deploy API, audit log, tokio-cron-scheduler with dynamic KV watch
Sprint 4 — wave rollout + agent-side jitter, embedded SPA dashboard, HS256 JWT middleware, agent self-update via the JetStream Object Store (atomic exe swap + SCM failure-action restart in v0.1.5)
Sprint 5 (v0.2.0) — server-managed group membership: agent_groups KV bucket, dynamic agent-side subscribe/unsubscribe, admin API + kanade agent groups CLI. [agent] groups field in agent.toml deprecated
Sprint 6 (v0.3.0) — layered agent_config KV bucket: ConfigScope per global / per-group / per-pc, resolver with deterministic precedence + multi-group conflict warnings, dynamic cadence reconciliation for heartbeat / inventory / self_update, admin API + kanade config CLI. [inventory] section in agent.toml deprecated
v0.3.1 — kanade-backend auto-bootstraps every JetStream resource at startup; the operator-side kanade jetstream setup is now optional

Backlog: Prometheus metrics, 3000-agent simulation, NATS cluster + replicated backend, Postgres migration.

Production install layout

cargo install drops the binaries under ~/.cargo/bin/ (user-local). For a real deployment, copy them into the spec §2.11 layout and register a service so they survive reboots.

Path layout

Windows                                    Linux
C:\Program Files\Kanade\                   /usr/local/bin/
  ├── kanade-agent.exe                       ├── kanade-agent
  ├── kanade-backend.exe                     ├── kanade-backend
  ├── kanade.exe                             ├── kanade
  └── nats-server.exe                        └── nats-server

C:\ProgramData\Kanade\config\              /etc/kanade/
  ├── agent.toml                             ├── agent.toml
  └── backend.toml                           └── backend.toml

C:\ProgramData\Kanade\data\                /var/lib/kanade/
  ├── state.db        (agent)                ├── state.db
  ├── outbox\         (agent)                ├── outbox/
  ├── staging\        (self-update)          ├── staging/
  ├── backend.db      (backend)              ├── backend.db
  ├── certs\                                 ├── certs/
  └── nats\           (JetStream data)       └── nats/

C:\ProgramData\Kanade\logs\                /var/log/kanade/
  ├── agent.log                              ├── agent.log
  ├── backend.log                            ├── backend.log
  └── nats-server.log                        └── nats-server.log

Config discovery

Every binary looks up its config file in this exact order (no cwd fallback — too easy to load the wrong file by accident):

--config <path> CLI flag (always honored, even if the file doesn't exist — that's the caller's choice).
Environment variable: KANADE_AGENT_CONFIG for kanade-agent, KANADE_BACKEND_CONFIG for kanade-backend. Non-empty value wins.
<config_dir>/<basename>:
- Windows: %ProgramData%\Kanade\config\agent.toml
- Linux: /etc/kanade/agent.toml

If none of the three is reachable, the binary exits with a message listing every option an operator can use to fix it.

Install scripts (Windows, recommended)

For hosts without cargo installed (the common case for agents and production backends), use the PowerShell deploy scripts under scripts/. The flow is "drop exe + config + script into one folder, run as Admin": the script lays out the directory tree, copies the binary into %ProgramFiles%\Kanade\, seeds the config into %ProgramData%\Kanade\config\ (without clobbering an existing edited one), and registers the Windows service.

# 1. On the build host: grab the release binaries + sample configs.
#    Either from a GitHub Release zip, or from a `cargo build --release`
#    output, or by `cargo install --root .\stage kanade-agent`.

# 2. Stage one folder per role with the matching files:
#    .\stage-agent\
#      ├── deploy-agent.ps1     (from scripts\ in this repo)
#      ├── kanade-agent.exe
#      └── agent.toml           (edit before deploy)
#
#    .\stage-backend\
#      ├── deploy-backend.ps1
#      ├── kanade-backend.exe
#      └── backend.toml

# 3. Copy each stage folder onto the target host (xcopy, robocopy,
#    scp, USB stick — whatever fits your environment).

# 4. On the target host, run the matching script as Administrator:
PS> .\deploy-agent.ps1
PS> .\deploy-backend.ps1 -FirewallPort 8443    # match bind_addr in backend.toml

Re-running the script upgrades the binary in place and preserves the edited config. Pass -ForceConfig to overwrite the installed config from the source folder, or -NoStart to skip the post-install service start.

Windows Service registration (sc.exe)

If you'd rather not use the deploy scripts (or want to understand exactly what they do), here are the equivalent manual commands:

# Stage the binaries
New-Item -ItemType Directory -Force 'C:\Program Files\Kanade'
Copy-Item "$env:USERPROFILE\.cargo\bin\kanade-agent.exe"   'C:\Program Files\Kanade\'
Copy-Item "$env:USERPROFILE\.cargo\bin\kanade-backend.exe" 'C:\Program Files\Kanade\'

# Stage the config (review + edit first)
New-Item -ItemType Directory -Force 'C:\ProgramData\Kanade\config'
Copy-Item .\agent.toml   'C:\ProgramData\Kanade\config\'
Copy-Item .\backend.toml 'C:\ProgramData\Kanade\config\'

# Register the agent as a service running under LocalSystem.
sc.exe create KanadeAgent `
  binPath= '"C:\Program Files\Kanade\kanade-agent.exe"' `
  start= auto `
  obj= LocalSystem `
  DisplayName= "Kanade Endpoint Agent"
sc.exe failure KanadeAgent reset= 86400 actions= restart/60000/restart/60000/restart/60000

# Register the backend the same way.
sc.exe create KanadeBackend `
  binPath= '"C:\Program Files\Kanade\kanade-backend.exe"' `
  start= auto `
  obj= LocalSystem `
  DisplayName= "Kanade Backend"

sc.exe start KanadeAgent
sc.exe start KanadeBackend

Linux systemd units

# /etc/systemd/system/kanade-backend.service
[Unit]
Description=Kanade Backend
After=network.target nats.service

[Service]
ExecStart=/usr/local/bin/kanade-backend
Restart=always
User=kanade
Environment=RUST_LOG=info

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now kanade-backend.service

The agent unit is symmetric (kanade-agent.service, ExecStart=/usr/local/bin/kanade-agent).

Scaffolded with kata

The skeleton (AGENTS.md / Makefile.toml / clippy.toml / rustfmt.toml / .github/workflows/* / etc.) was applied via github.com/yukimemi/pj-presets:rust-cli through kata init. The Cargo workspace layout under crates/ is hand-written because the preset is single-crate by default; a pj-rust-workspace layer is on the future TODO once the multi-crate patterns stabilise.

License

MIT — see LICENSE.