canic-cli
Operator CLI for Canic backup and restore workflows.
The initial command focuses on snapshot capture/download planning and execution for a canister plus its registry-discovered children.
Use --recursive instead of --include-children to include all descendants.
Use --registry-json <file> to plan from a saved canic_subnet_registry
response instead of querying a live root. Non-dry-run captures recompute the
selection topology immediately before snapshot creation and fail if the hash
changed since discovery.
DFX only creates snapshots for stopped canisters. Pass
--stop-before-snapshot --resume-after-snapshot when the CLI should perform
that local lifecycle step around each captured artifact.
Successful non-dry-run captures write the canonical backup layout: manifest,
download journal, and durable artifact directories. Generated manifests include
each durable artifact checksum so verification can detect manifest/journal
drift before restore planning. Download journals also include
operation_metrics counters for target count, snapshot create, snapshot
download, checksum verification, and artifact finalization progress.
Validate a captured manifest before restore planning:
The validation summary includes topology hash inputs, consistency mode, backup unit counts, kind counts, and per-unit topology validation metadata.
Inspect resumable journal status:
--require-complete still writes the JSON status report, then exits with an
error when any artifact has resume work remaining.
Inspect manifest and journal agreement without reading artifact bytes:
--require-ready still writes the JSON inspection report, then exits with an
error when manifest and journal metadata, including topology receipts, are not
ready for full verification.
Emit a provenance report for audit/review workflows:
The report records source/tool metadata, topology receipts, declared backup
units, and each member's snapshot/code/artifact provenance without reading
artifact bytes. --require-consistent still writes the JSON report, then exits
with an error when manifest and journal backup IDs or topology receipts drift.
Verify the backup layout and durable artifact checksums:
Run the standard no-mutation preflight bundle:
Preflight writes manifest-validation.json, backup-status.json,
backup-inspection.json, backup-provenance.json, backup-integrity.json,
restore-plan.json, restore-status.json, and preflight-summary.json.
The summary records the backup ID, source root, environment, topology hash,
readiness statuses, provenance consistency status, topology mismatch count,
journal operation metrics, member counts, restore identity/snapshot/
verification/operation/ordering counts, snapshot provenance readiness booleans,
verification readiness booleans, restore_mapping_supplied,
restore_all_sources_mapped, restore_ready, stable
restore_readiness_reasons, and paths to the generated reports.
--require-restore-ready still writes the full report bundle, then exits with
an error when restore_ready is false.
Restore planning is manifest-driven and performs no mutations:
--require-verified runs the same manifest, journal, durable artifact, and
checksum checks as canic backup verify before emitting the plan.
--require-restore-ready still writes the restore plan, then exits with an
error when readiness_summary.ready is false.
Restore plans include an identity_summary with explicit mapping mode,
all-sources-mapped status, and fixed, relocatable, mapped, in-place, and
remapped member counts. They also include a snapshot_summary with module
hash, wasm hash, code version, and checksum coverage counts and readiness
booleans, plus a verification_summary with post-restore check counts,
verification_required, and all_members_have_checks. A readiness_summary
collapses those signals into a single ready flag and stable reason strings.
Plans also include an operation_summary with planned snapshot uploads, loads,
code reinstalls, verification checks, total operations, and phases, a
fleet_verification_checks list for fleet-level checks, plus an
ordering_summary and per-member ordering
dependency metadata so dry-runs show when parent relationships are satisfied
inside the same restore group or by an earlier group. Role-level
verification.member_checks are expanded onto matching members during planning,
and non-empty check roles filters limit a check to matching member roles.
Emit the initial restore execution status from a plan:
Restore status is no-mutation. It copies the plan identity, readiness,
verification, phase, and operation counts, then marks each planned member as
planned with its source/target canister, snapshot ID, and artifact path.
Render the restore execution operations without mutating targets:
Apply dry-run output expands the restore phases into ordered upload, load,
reinstall, member verification, and final verify-fleet operations. Fleet
verification commands run after member work and target the restored fleet root.
When --backup-dir is supplied, the dry-run also verifies that referenced
artifact paths stay under that backup directory, exist on disk, and match their
expected SHA-256 checksums when the plan includes checksums. When --journal-out
is supplied, the command also writes an initial apply journal with each
operation marked ready or blocked and stable blocking reasons. The command
requires --dry-run; real restore execution is intentionally not enabled yet.
Dry-run output also includes operation_counts so operators can compare the
rendered operation mix before writing or running a journal.
Summarize a restore apply journal:
Use --require-ready when scripts should stop if the journal still has blocked
restore operations. Use --require-no-pending when scripts should stop if a
restore operation is already claimed and needs inspection or apply-unclaim.
Use --require-no-failed when failed operations should stop the runner before
completion checks. Use --require-complete when scripts should fail until every
apply operation is completed. Apply status output also includes
operation_counts, broken down by snapshot uploads, snapshot loads, code
reinstalls, member verifications, fleet verifications, and total verification
operations.
Write an operator-focused restore apply report:
Apply reports include one high-level outcome, attention-required status,
operation counts, operation-kind counts, blocked reasons, the next
transitionable operation, and the pending, failed, and blocked operation rows
that need review. Use
--require-no-attention when CI should fail after writing the report if the
journal has pending, failed, or blocked work.
Preview or run the maintained restore runner path:
Use restore run --dry-run to preview the native runner path from the apply
journal. It emits the current status, attention summary, next transition, and
next dfx command without mutating the journal or calling dfx.
Use restore run --execute to let canic own the guarded runner loop. It
checks readiness, claims the next sequence, runs the generated dfx command,
marks the operation completed or failed, and persists the journal after each
transition. --max-steps is useful for cautious incremental restores. Add
--require-complete or --require-no-attention when CI should write the run
summary and then fail if the journal is incomplete or still needs review.
If a generated command fails, the runner still writes the summary and updated
journal before returning a nonzero error.
Every runner summary includes run_mode, stopped_reason, next_action, and
operation-kind counts so automation can decide whether to rerun, inspect a
failed operation, recover a pending operation, fix blocked inputs, or stop.
Use --require-run-mode <text>, --require-stopped-reason <text>, and
--require-next-action <text> when CI should write the summary and then fail
unless the runner stopped in the expected state.
Use --require-executed-count <n> when a batched run must execute exactly the
expected number of operations.
Use restore run --unclaim-pending after an interrupted runner leaves one
operation pending. It moves the pending operation back to ready, writes the
updated journal, and emits a recovery summary.
The script remains as a small compatibility wrapper around
canic restore run. It defaults to --execute for existing callers, can also
pass through --dry-run and --unclaim-pending, and accepts the older
--report-out, --status-out, and --command-out flags for existing runbooks.
The native runner now owns the guarded status, claim, dfx execution, marking,
and summary behavior.
Emit the full next transitionable operation for an external runner:
Preview the dfx command for the next transitionable operation without
executing it:
Use --dfx <path> when the runner should preview a non-default dfx binary.
Use --require-command when scripts should fail after writing the preview if no
executable operation is available.
Claim the next operation before executing it in an external runner:
Claiming marks the next ready operation pending. Pending operations remain the
next transitionable operation until apply-mark records them as completed or
failed, which lets interrupted runners resume from the same journal. Use
--sequence <n> to fail if the next transitionable operation no longer matches
the operation the runner previewed. Use --updated-at <text> to record a
runner-provided state marker; when omitted, the CLI writes unknown.
Release the current pending operation back to ready when a runner stopped before executing it:
Use --sequence <n> with apply-unclaim when recovery scripts should only
release the pending operation they inspected.
Mark one journal operation after an external restore step completes or fails:
Use --state failed --reason <text> to record a failed operation. The command
validates the input journal, refuses to skip earlier ready operations, refreshes
operation counts, and writes the updated journal without executing any restore
mutation. Use --require-pending when runners should only mark operations that
were claimed first.
Example external restore runner loop:
journal=restore-apply-journal.json
network=local
while ; do
if ; then
break
fi
sequence=""
command=""
updated_at=""
done
If the runner stops after claiming work but before executing the previewed
command, inspect restore-apply-status.json and use apply-unclaim --sequence <n> to release the pending operation back to ready.