Skip to main content

Module state_loader

Module state_loader 

Source
Expand description

v0.8.4 #72 — load manager snapshot files with per-manager fault isolation.

§Why

Pre-#72 each of the nine --*-state-file loaders in main.rs used the from_json(&raw).map_err(|e| format!(...))? pattern: a single corrupted, truncated, or schema-incompatible snapshot would bubble Err out of the boot sequence and kill the gateway start-up. The operator was forced to either restore the file from backup or manually rm it before the gateway would even bind its listener — a loud restart-loop that took the entire data-plane down for one manager’s bad JSON.

§What changed

load_or_fresh turns the read-side Err/parse-side Err into:

  1. a tracing::warn! log line carrying the manager name, the file path, and the underlying error (operators grep for state file parse failed in logs);
  2. a bump to the s4_state_file_load_failures_total{manager,reason} Prometheus counter (operators alert on rate(... > 0) so silent boot-time fall-backs surface in dashboards);
  3. a fresh T::default() manager — the gateway boots with empty in-memory state for the affected manager and the operator’s snapshot file is left in place for post-mortem inspection (we never touch the operator’s bytes — recovering / re-importing is their call).

Every other manager keeps loading normally. One bad file no longer cascades into a gateway-wide DoS.

§What did NOT change

  • --mfa-default-secret-file keeps its fail-closed read path. A missing or unreadable MFA secret means MFA verification cannot succeed; silently booting with no secret would let DELETEs slip past the MFA gate. That call site stays inside the MFA loader block and continues to surface a hard error.
  • The on-disk snapshot is never deleted, renamed, or rewritten by the boot path. Operators decide whether to rm the bad file or restore from a known-good copy.

Functions§

load_or_fresh
v0.8.4 #72: load a manager snapshot with per-manager graceful degradation. See module docs for the contract.
read_state_file_or_fresh
Read a --*-state-file <PATH> snapshot, returning Ok(None) for the three “start fresh” cases and Ok(Some(json)) for the actual restore-from-snapshot case: