bext_plugin_api/locking.rs
1//! Locking capability trait. See `plan/ecosystem/02-capabilities.md §Locking`.
2//!
3//! A `LockingPlugin` provides **distributed mutual exclusion** keyed on a
4//! string. The canonical use case is "ensure this scheduled job fires once
5//! across the cluster, not once per instance" — hence the E2 pairing with
6//! the Scheduled retrofit — but the same shape is useful for leader
7//! election, exactly-once webhook delivery, and coordinating expensive
8//! cache rebuilds.
9//!
10//! # Design notes
11//!
12//! - **Contention is not failure.** `try_lock` returns `Ok(None)` when the
13//! lock is held by someone else and `Err(..)` when the backend itself is
14//! broken (Redis unreachable, Postgres connection dropped). Callers
15//! almost always want to branch on that distinction — retry a moment
16//! later on contention, alarm an operator on backend failure — so
17//! encoding it in the type beats a single `LockError` enum with a
18//! `Held` variant. This is the load-bearing shape choice and is the
19//! reason the trait does not use a typed error enum.
20//! - **Everything else is `Result<_, String>`.** `renew` and `release`
21//! cannot meaningfully distinguish "lock expired out from under us"
22//! from "backend failure" for a caller: in both cases the correct
23//! response is "log and move on, you lost the lock." Folding them into
24//! `Err(String)` keeps the trait lean and WASM-ABI friendly (matches
25//! the convention established by [`crate::scheduled::ScheduledPlugin`]
26//! and [`crate::lifecycle::LifecyclePlugin`]).
27//! - **Opaque `LockHandle`.** The handle carries a `lock_id` (a random
28//! token minted by the plugin when the lock was acquired) plus the
29//! `key` it was taken on. Backends use the token to implement CAS-style
30//! release — Redlock SETs with `NX PX` and a random value, then the
31//! release script checks the value before `DEL` — so a renew or
32//! release from a stale caller cannot stomp on a newer lock holder.
33//! The token shape is a `String` so WASM guests can marshal it through
34//! the JSON ABI unchanged.
35//! - **Sync trait.** Matches the convention in
36//! [`crate::scheduled::ScheduledPlugin`] and
37//! [`crate::session::SessionPlugin`]. Backends that need an async
38//! client drive their own runtime inside the call (same approach as
39//! `bext-session-redis` and `bext-tracer-otlp`).
40//! - **No vendor leaks.** The trait has no `redis_url`, no
41//! `advisory_lock_key`, no `ttl_ms_override_for_etcd`. Configuration
42//! lives on the concrete plugin's constructor; the trait is one shape
43//! across memory / Redis / Postgres / etcd.
44//!
45//! # Backends
46//!
47//! Three reference backends ship alongside this trait in `crates/bext-impls/`:
48//!
49//! - `bext-locking-memory` — single-node fallback, `Mutex<HashMap>`.
50//! - `bext-locking-redis` — Redlock-style `SET NX PX` + Lua CAS release.
51//! - `bext-locking-pg` — Postgres `pg_try_advisory_lock(hashtext(key))`.
52//!
53//! # Use by the Scheduled capability
54//!
55//! When a [`crate::scheduled::ScheduledPlugin`] declares a
56//! [`LockingHint::RequireGlobal`] schedule, the host-owned scheduler in
57//! `bext-core::scheduler` acquires a `LockingPlugin` lock keyed on the
58//! schedule id before invoking
59//! [`ScheduledPlugin::run`](crate::scheduled::ScheduledPlugin::run). On
60//! contention (`Ok(None)`) the scheduler skips this tick — another node
61//! is running it. On backend failure (`Err(..)`) the scheduler logs and
62//! skips, matching the existing "lost the lock" semantics.
63//!
64//! [`LockingHint::RequireGlobal`]: crate::scheduled::LockingHint::RequireGlobal
65
66use serde::{Deserialize, Serialize};
67
68/// Handle returned by a successful [`LockingPlugin::try_lock`].
69///
70/// The handle is **opaque** to callers — they pass it back to
71/// [`LockingPlugin::renew`] and [`LockingPlugin::release`] unchanged. The
72/// plugin uses the embedded `lock_id` token to detect stale operations
73/// (see module docs).
74#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
75pub struct LockHandle {
76 /// The key the lock was acquired on. Matches the `key` argument
77 /// passed to [`LockingPlugin::try_lock`]. Useful for logging and
78 /// metrics; the plugin also needs it on release / renew.
79 pub key: String,
80 /// Random token minted by the plugin at acquisition time. Backends
81 /// use it to implement check-and-set release so a stale caller
82 /// cannot release a lock it no longer owns. Format is
83 /// plugin-defined (UUIDv4 is typical) and callers must treat it as
84 /// opaque.
85 pub lock_id: String,
86 /// The TTL the lock was acquired with, in milliseconds. Stored on
87 /// the handle so `renew` can re-apply the same TTL without the
88 /// caller having to remember it.
89 pub ttl_ms: u64,
90}
91
92/// A plugin that provides distributed mutual exclusion.
93///
94/// **Compile-time and WASM execution.** All methods use JSON-friendly
95/// POD types so the trait is ABI-compatible with WASM guest plugins
96/// (matches the convention used by the other capability traits in this
97/// crate). Backends that need a real async client run their own runtime
98/// inside the call.
99///
100/// Concurrency: implementations MUST be safe to call from multiple
101/// threads simultaneously. The host may contend on the same key from
102/// different request-handling threads.
103pub trait LockingPlugin: Send + Sync {
104 /// Unique plugin name (e.g., `"memory"`, `"redis"`, `"pg"`). Used
105 /// by the dev dashboard, metrics labels, and `cap_conformance`.
106 fn name(&self) -> &str;
107
108 /// Attempt to acquire the lock identified by `key` for at most
109 /// `ttl_ms` milliseconds.
110 ///
111 /// # Return shape
112 ///
113 /// - `Ok(Some(handle))` — the lock was acquired. The caller owns
114 /// it for `ttl_ms` and may renew or release.
115 /// - `Ok(None)` — **contention**. Another owner holds the lock.
116 /// The caller typically backs off and retries later, or simply
117 /// skips this run (the Scheduled capability skips).
118 /// - `Err(message)` — backend failure (network, protocol,
119 /// configuration). The caller cannot meaningfully retry without
120 /// operator intervention.
121 ///
122 /// The `Ok(None)` vs `Err(..)` distinction is the reason this trait
123 /// does not use a typed `LockError` enum — it is the only branch
124 /// that matters to 99% of callers, and encoding it in the outer
125 /// `Result` is cleaner than a `LockError::Held` variant they would
126 /// have to match every single call site.
127 fn try_lock(&self, key: &str, ttl_ms: u64) -> Result<Option<LockHandle>, String>;
128
129 /// Extend the lifetime of an existing lock.
130 ///
131 /// On success the lock's TTL is reset to `handle.ttl_ms`. On
132 /// failure — either because the lock has already expired or the
133 /// backend is broken — returns `Err(message)`. Callers treat the
134 /// failure uniformly: "you no longer own this lock; stop holding
135 /// it." The default retry policy is to abandon the work and try
136 /// `try_lock` again from scratch on the next opportunity.
137 fn renew(&self, handle: &LockHandle) -> Result<(), String>;
138
139 /// Release the lock.
140 ///
141 /// Implementations MUST verify that the lock is still held under
142 /// the caller's `handle.lock_id` before releasing (check-and-set),
143 /// so a stale call after the TTL expired cannot clobber a newer
144 /// holder. Releasing a lock the caller does not own is a silent
145 /// no-op, not an error.
146 fn release(&self, handle: LockHandle) -> Result<(), String>;
147
148 /// Called before the plugin is unloaded. Release backend resources
149 /// (Redis connection, Postgres pool). Default: no-op.
150 fn cleanup(&self) -> Result<(), String> {
151 Ok(())
152 }
153}
154
155/// Fuel budgets for WASM locking plugin calls. Matches the convention
156/// in [`crate::scheduled::fuel`].
157pub mod fuel {
158 /// Fuel for a single [`super::LockingPlugin::try_lock`] call.
159 pub const TRY_LOCK: u64 = 50_000_000;
160 /// Fuel for a single [`super::LockingPlugin::renew`] call.
161 pub const RENEW: u64 = 50_000_000;
162 /// Fuel for a single [`super::LockingPlugin::release`] call.
163 pub const RELEASE: u64 = 50_000_000;
164 /// Fuel for [`super::LockingPlugin::cleanup`].
165 pub const CLEANUP: u64 = 100_000_000;
166}