kanade-backend 0.1.1

kanade-backend-0.1.1 is not a library.

奏 — orchestrate. A self-hosted Rust pub/sub backbone for managing thousands of Windows endpoints without Active Directory. NATS / JetStream carries inventory polling, fleet-wide rollouts, and ad-hoc emergency commands on a single channel.

Status: 0.1.0 — Sprint 4 shipped. Agent + backend (axum + SQLite projector + JetStream KV watcher + cron scheduler) + admin CLI + an embedded SPA dashboard + JWT-gated /api/* + agent self-update via the JetStream Object Store. Full design lives in docs/SPEC.md (Japanese, ~1150 lines covering Part 1 overview and Part 2 detailed design).

Why

The off-the-shelf endpoint managers (Intune, Tanium, Workspace ONE, …) either require Active Directory, lock you into a vendor cloud, or both. For shops that want AD-independent, on-prem, scriptable fleet control the answer has historically been "build something on top of a message broker" — which everyone reinvents from scratch.

kanade aims to be the reusable shape of that build:

NATS + JetStream as the only moving part. Agents speak to the broker over outbound TLS; the broker fans out commands, fans in inventory and results. No AD, no client-pull-from-server, no opening inbound ports on user PCs.
Declarative job manifests in Git. Review, history, rollback all come for free; the YAML schema (jobs/*.yaml) is the same input whether you kanade deploy ad-hoc or wire it onto a cron kanade schedule.
Three layers of stop-the-bleed. Stream max-msgs-per-subject replaces stale rollouts in the broker; consumer-side version checks guard execution; kanade kill <job_id> terminates running children. The emergency-stop path is wired from MVP, not bolted on later (see SPEC.md §2.6).
Phased build-out. One server is enough for a few hundred endpoints; the same code scales to a 3-node NATS cluster + replicated backend + Postgres for several thousand.

Crates

crate	kind	role
`kanade-shared`	lib	wire types (`Command` / `ExecResult` / `Heartbeat` / `HwInventory`), NATS subject + KV helpers, YAML manifest schema, teravars-backed config loader
`kanade-agent`	bin	Windows-side resident daemon: subscribes to `commands.*`, runs child processes, publishes results + heartbeats + WMI inventory, watches `agent_config.target_version` for self-update
`kanade-backend`	bin	axum HTTP server: `/health`, `/api/{agents,results,audit,deploy,schedules}`, embedded SPA at `/`. Runs 3 durable JetStream projectors (INVENTORY/RESULTS/AUDIT → SQLite) and a `tokio-cron-scheduler` driven by the schedules KV
`kanade`	bin	operator-side admin CLI (`kubectl`-style single entry point); subcommands talk to NATS directly for `run`/`ping`/`kill`/`revoke`/`jetstream` and to the backend over HTTP for `deploy`/`schedule`/`agent`

Install

You'll need:

Rust 1.85+ (the workspace pins edition = "2024")
A NATS server (Go binary, ~15 MB)

# 1. NATS server
scoop install nats-server         # or: winget install nats-io.nats-server

# 2. Clone + install all three binaries to ~/.cargo/bin/
git clone https://github.com/yukimemi/kanade.git
cd kanade
cargo install --path crates/kanade
cargo install --path crates/kanade-agent
cargo install --path crates/kanade-backend

kanade, kanade-agent, and kanade-backend are now on your PATH.

Quick start (5 terminals, ~2 minutes)

Run each step in its own PowerShell window so the daemons stay up. All of them assume cd into the cloned repo root, because the bundled agent.toml / backend.toml and the jobs/*.yaml samples live there.

1 — start NATS

nats-server -js -p 4222

2 — provision JetStream (one-time)

kanade jetstream setup

Creates the INVENTORY / RESULTS / DEPLOY / AUDIT streams, the script_current / script_status / agents_state / agent_config KV buckets, and the agent_releases Object Store.

3 — start the backend

$env:KANADE_AUTH_DISABLE = "1"   # JWT off for development
kanade-backend

Serves the dashboard at http://127.0.0.1:8080 and the JSON API at /api/*. SQLite is created at ./backend.db. Both projectors and the cron scheduler start in the background.

4 — start the agent

kanade-agent

Loads ./agent.toml, picks $env:COMPUTERNAME as pc_id, subscribes to commands.all + commands.pc.{pc_id} + every group declared in agent.toml (canary + wave1 in the bundled sample), starts the heartbeat / inventory / self-update loops.

5 — drive it

# Round-trip a script via NATS, request/reply.
kanade run $env:COMPUTERNAME -- 'echo hello from kanade'

# Or via the backend's YAML deploy path (writes a row to deployments,
# emits an audit event, broadcasts the Command).
kanade deploy jobs/echo-test.yaml

# Heartbeat probe.
kanade ping $env:COMPUTERNAME

# Inspect via curl…
curl http://127.0.0.1:8080/api/agents
curl http://127.0.0.1:8080/api/results
curl http://127.0.0.1:8080/api/audit

# …or open the dashboard.
start http://127.0.0.1:8080

CLI cheat sheet

kanade run    <pc_id> -- <script>                # request/reply via NATS
kanade ping   <pc_id>                            # wait for one heartbeat
kanade kill   <job_id>                           # publish kill.{job_id}
kanade revoke <cmd_id>                           # script_status = REVOKED
kanade unrevoke <cmd_id>                         # → ACTIVE

kanade jetstream setup                           # create streams + KV + Object Store
kanade jetstream status                          # health snapshot

kanade deploy   <manifest.yaml> [--version <v>]  # POST /api/deploy
kanade schedule create <schedule.yaml>           # POST /api/schedules (cron + manifest)
kanade schedule list
kanade schedule delete <id>

kanade agent publish <binary> --version <v>      # upload to Object Store + flip target_version
kanade agent current                             # read agent_config.target_version

kanade <subcommand> --help for argument details.

Authoring jobs

YAML manifests in jobs/*.yaml (see spec §2.4.1). Sample manifests in the repo cover:

jobs/echo-test.yaml — minimal ad-hoc command
jobs/wave-test.yaml — rollout.waves rollout (canary → wave1 with delay)
jobs/schedule-test.yaml — cron-driven echo every 10 s

A wave manifest sketch:

id: cleanup-disk-temp
version: 1.0.1
target:
  pcs: [PC1234]
execute:
  shell: powershell
  script: |
    $temp = [System.IO.Path]::GetTempPath()
    Remove-Item "$temp\*" -Recurse -Force -ErrorAction SilentlyContinue
  timeout: 600s
  jitter: 5m
rollout:
  strategy: wave
  waves:
    - { group: canary, delay: 0s  }
    - { group: wave1,  delay: 30m }

Config files

Both use teravars templating — {{ system.host }}, {{ env(name="X", default="Y") }}, {% if is_windows() %}…{% endif %} are all available.

agent.toml:

[agent]
id = '{{ system.host }}'
nats_url = 'nats://127.0.0.1:4222'
groups = ['canary', 'wave1']

[inventory]
hw_interval = '24h'
jitter = '10m'
enabled = true

[log]
path = 'logs/agent.log'
level = 'info'

backend.toml:

[server]
bind = '0.0.0.0:8080'

[nats]
url = 'nats://127.0.0.1:4222'

[db]
sqlite_path = './backend.db'

[log]
path = 'logs/backend.log'
level = 'info'

Dev workflow

cargo make check       # fmt-check + clippy + test + lock-check (same as CI)
cargo make fmt         # apply formatting
cargo make on-add      # renri post_create hook (apm install + vcs fetch)

The workspace pins [profile.dev] debug = "line-tables-only" because Windows MSVC link.exe hits LNK1318 (PDB record limit) once axum + sqlx + reqwest + tokio-cron-scheduler + jsonwebtoken all sit in one workspace; line-tables-only keeps backtraces useful without exploding the PDB.

Sprint history

Sprint 1 — workspace scaffolding, NATS plumbing, agent + CLI echo round-trip
Sprint 2 — §2.6 kill switch (subscribe + flush race fix), version-pin KV, WMI HW inventory
Sprint 3 — backend skeleton, SQLite projectors, YAML deploy API, audit log, tokio-cron-scheduler with dynamic KV watch
Sprint 4 — wave rollout + agent-side jitter, embedded SPA dashboard, HS256 JWT middleware, agent self-update via the JetStream Object Store

Sprint 5 (Prometheus metrics, 3000-agent simulation, backups) and Sprint 6 (NATS cluster + replicated backend + Postgres migration) are open backlog items.

Scaffolded with kata

The skeleton (AGENTS.md / Makefile.toml / clippy.toml / rustfmt.toml / .github/workflows/* / etc.) was applied via github.com/yukimemi/pj-presets:rust-cli through kata init. The Cargo workspace layout under crates/ is hand-written because the preset is single-crate by default; a pj-rust-workspace layer is on the future TODO once the multi-crate patterns stabilise.

License

MIT — see LICENSE.