hmac_circuit_breaker/lib.rs
1//! # hmac-circuit-breaker
2//!
3//! An HMAC-protected circuit breaker with **fail-open semantics** for service resilience.
4//!
5//! ## The Problem
6//!
7//! Standard circuit breakers that persist failure state to disk introduce an
8//! attack surface: an adversary with write access to the state file can trip every
9//! circuit—causing a denial-of-service without touching the services themselves.
10//!
11//! ## The Solution
12//!
13//! This crate computes **HMAC-SHA256** over the circuit state and embeds it in the
14//! state file. On every reload the HMAC is verified before any state is trusted.
15//!
16//! ### Why Fail-Open on HMAC Mismatch?
17//!
18//! When the HMAC doesn't match, the crate **clears** all circuit state (fail-open)
19//! rather than blocking all traffic (fail-closed). This is a deliberate security
20//! decision:
21//!
22//! | On-tamper response | What the attacker achieves |
23//! |---|---|
24//! | **Fail-closed** (block all) | Full self-DoS — attacker writes bad MAC, every circuit trips |
25//! | **Fail-open** (clear all) | Temporary removal of protection — worst case is the baseline behaviour *without* a circuit breaker |
26//!
27//! Fail-open means an attacker can at most *remove* circuit protection for one reload
28//! cycle, not weaponise it. The integrity violation is logged as a warning so operators
29//! are alerted immediately.
30//!
31//! ## State Machines
32//!
33//! ### File-based state (written by external producer)
34//!
35//! ```text
36//! pass
37//! ┌──────────┐ fail ┌──────┐ fail×N ┌─────────┐
38//! │ Closed │────────►│ Open │─────────►│ Tripped │
39//! │ (normal) │ │ │ │(blocked)│
40//! └──────────┘ └──┬───┘ └────┬────┘
41//! ▲ │ pass │ pass (next health cycle)
42//! └──────────────────┴────────────────────┘
43//! ```
44//!
45//! ### In-process runtime state (managed by the axum middleware)
46//!
47//! ```text
48//! fail×N cooldown elapsed
49//! ┌──────────┐ ────────► ┌─────────┐ ──────────► ┌──────────┐
50//! │ Closed │ │ Tripped │ │ HalfOpen │
51//! │ (normal) │ ◄──────── │(blocked)│ ◄────────── │ (1 probe)│
52//! └──────────┘ recover └─────────┘ probe fail └────┬─────┘
53//! │ probe ok
54//! ▲──────────────────────────────────────────────────┘
55//! ```
56//!
57//! * **Closed** – no in-process failures; requests pass through.
58//! * **Tripped** – `threshold` consecutive 5xx responses from the inner service.
59//! Requests are rejected with 503 until the cooldown elapses.
60//! * **HalfOpen** – one probe request is allowed through. Success → Closed;
61//! failure → Tripped (cooldown restarts).
62//!
63//! ## Architecture
64//!
65//! Circuit state is tracked in two complementary layers:
66//!
67//! 1. **On disk** — a JSON file written by an external producer (health-check
68//! cron, monitoring daemon) with an embedded HMAC-SHA256 tag.
69//! Verified on every reload; mismatch → fail-open.
70//! 2. **In memory** — two `Arc<RwLock<HashMap>>` maps:
71//! * `SharedState` — reloaded from disk every *N* seconds; reflects the
72//! external producer's view of each service.
73//! * `RuntimeState` — managed entirely by the axum middleware; trips
74//! immediately when the **current process** observes consecutive failures,
75//! then auto-recovers via half-open probing without waiting for the next
76//! health-check cycle.
77//!
78//! ## Quick Start
79//!
80//! ```rust,no_run
81//! use hmac_circuit_breaker::{CircuitBreakerConfig, CircuitBreakerHandle};
82//! use std::path::PathBuf;
83//! use std::time::Duration;
84//!
85//! #[tokio::main]
86//! async fn main() {
87//! let config = CircuitBreakerConfig::builder()
88//! .state_file(PathBuf::from("/var/run/myapp/circuit_breaker.json"))
89//! .secret("my-hmac-secret")
90//! .threshold(3)
91//! .reload_interval(Duration::from_secs(60))
92//! .build();
93//!
94//! let handle = CircuitBreakerHandle::new(config);
95//! handle.spawn_reload(); // background reload every 60 s
96//!
97//! // Check before dispatching work
98//! if handle.is_tripped("payment-service").await {
99//! eprintln!("payment-service is currently unavailable");
100//! }
101//! }
102//! ```
103//!
104//! ## Features
105//!
106//! | Feature | Default | Description |
107//! |---|---|---|
108//! | `reload` | yes | Enables `CircuitBreakerHandle::spawn_reload()` (requires tokio) |
109//! | `axum` | no | Enables `circuit_breaker_layer()` axum middleware |
110
111pub mod config;
112pub mod integrity;
113pub mod loader;
114pub mod state;
115pub mod writer;
116
117#[cfg(feature = "axum")]
118pub mod middleware;
119
120pub use config::{CircuitBreakerConfig, CircuitBreakerConfigBuilder};
121pub use state::{AlgorithmCircuitState, CircuitStatus, RuntimeServiceState, RuntimeStatus};
122pub use writer::write_state;
123
124use std::collections::HashMap;
125use std::sync::Arc;
126use tokio::sync::RwLock;
127
128/// Shared in-memory file-based circuit state, cheaply cloneable.
129pub type SharedState = Arc<RwLock<HashMap<String, AlgorithmCircuitState>>>;
130
131/// Shared in-process runtime circuit state, cheaply cloneable.
132///
133/// Managed entirely by the axum middleware — never persisted to disk.
134pub type RuntimeState = Arc<RwLock<HashMap<String, RuntimeServiceState>>>;
135
136/// High-level handle that owns the shared state and the config.
137///
138/// Clone it freely — all clones share the same underlying `Arc`.
139#[derive(Clone)]
140pub struct CircuitBreakerHandle {
141 pub(crate) state: SharedState,
142 pub(crate) runtime: RuntimeState,
143 pub(crate) config: Arc<CircuitBreakerConfig>,
144}
145
146impl CircuitBreakerHandle {
147 /// Create a new handle. The initial in-memory state is empty (all circuits closed).
148 pub fn new(config: CircuitBreakerConfig) -> Self {
149 Self {
150 state: Arc::new(RwLock::new(HashMap::new())),
151 runtime: Arc::new(RwLock::new(HashMap::new())),
152 config: Arc::new(config),
153 }
154 }
155
156 /// Load state from the configured file once, verifying HMAC integrity.
157 ///
158 /// On HMAC mismatch the in-memory state is cleared (fail-open) and a `tracing::warn`
159 /// is emitted. This is safe to call at startup and before spawning the reload task.
160 pub async fn load(&self) {
161 loader::load_into(&self.state, &self.config).await;
162 }
163
164 /// Spawn a background tokio task that calls [`load`](Self::load) every
165 /// `config.reload_interval`.
166 ///
167 /// The task runs until the last clone of this handle is dropped.
168 #[cfg(feature = "reload")]
169 pub fn spawn_reload(&self) {
170 let state = self.state.clone();
171 let config = self.config.clone();
172 tokio::spawn(async move {
173 loop {
174 loader::load_into(&state, &config).await;
175 tokio::time::sleep(config.reload_interval).await;
176 }
177 });
178 }
179
180 /// Returns `true` if the named service has been tripped (consecutive failures ≥
181 /// threshold). Returns `false` for unknown services (fail-open default).
182 pub async fn is_tripped(&self, service: &str) -> bool {
183 let guard = self.state.read().await;
184 guard
185 .get(service)
186 .map(|s| s.status == CircuitStatus::Tripped)
187 .unwrap_or(false)
188 }
189
190 /// Returns the full circuit state for a service, or `None` if not tracked.
191 pub async fn get(&self, service: &str) -> Option<AlgorithmCircuitState> {
192 let guard = self.state.read().await;
193 guard.get(service).cloned()
194 }
195
196 /// Returns a snapshot of the complete in-memory state.
197 pub async fn snapshot(&self) -> HashMap<String, AlgorithmCircuitState> {
198 self.state.read().await.clone()
199 }
200
201 /// Access the raw file-based shared state (e.g. to pass to the axum middleware).
202 pub fn shared_state(&self) -> SharedState {
203 self.state.clone()
204 }
205
206 /// Access the in-process runtime state (e.g. to pass to the axum middleware).
207 ///
208 /// Pass this alongside [`shared_state`](Self::shared_state) to
209 /// [`circuit_breaker_layer`](crate::middleware::circuit_breaker_layer) to
210 /// enable in-process failure detection and half-open probing.
211 pub fn runtime_state(&self) -> RuntimeState {
212 self.runtime.clone()
213 }
214}