Skip to main content

bext_plugin_api/
locking.rs

1//! Locking capability trait. See `plan/ecosystem/02-capabilities.md §Locking`.
2//!
3//! A `LockingPlugin` provides **distributed mutual exclusion** keyed on a
4//! string. The canonical use case is "ensure this scheduled job fires once
5//! across the cluster, not once per instance" — hence the E2 pairing with
6//! the Scheduled retrofit — but the same shape is useful for leader
7//! election, exactly-once webhook delivery, and coordinating expensive
8//! cache rebuilds.
9//!
10//! # Design notes
11//!
12//! - **Contention is not failure.** `try_lock` returns `Ok(None)` when the
13//!   lock is held by someone else and `Err(..)` when the backend itself is
14//!   broken (Redis unreachable, Postgres connection dropped). Callers
15//!   almost always want to branch on that distinction — retry a moment
16//!   later on contention, alarm an operator on backend failure — so
17//!   encoding it in the type beats a single `LockError` enum with a
18//!   `Held` variant. This is the load-bearing shape choice and is the
19//!   reason the trait does not use a typed error enum.
20//! - **Everything else is `Result<_, String>`.** `renew` and `release`
21//!   cannot meaningfully distinguish "lock expired out from under us"
22//!   from "backend failure" for a caller: in both cases the correct
23//!   response is "log and move on, you lost the lock." Folding them into
24//!   `Err(String)` keeps the trait lean and WASM-ABI friendly (matches
25//!   the convention established by [`crate::scheduled::ScheduledPlugin`]
26//!   and [`crate::lifecycle::LifecyclePlugin`]).
27//! - **Opaque `LockHandle`.** The handle carries a `lock_id` (a random
28//!   token minted by the plugin when the lock was acquired) plus the
29//!   `key` it was taken on. Backends use the token to implement CAS-style
30//!   release — Redlock SETs with `NX PX` and a random value, then the
31//!   release script checks the value before `DEL` — so a renew or
32//!   release from a stale caller cannot stomp on a newer lock holder.
33//!   The token shape is a `String` so WASM guests can marshal it through
34//!   the JSON ABI unchanged.
35//! - **Sync trait.** Matches the convention in
36//!   [`crate::scheduled::ScheduledPlugin`] and
37//!   [`crate::session::SessionPlugin`]. Backends that need an async
38//!   client drive their own runtime inside the call (same approach as
39//!   `bext-session-redis` and `bext-tracer-otlp`).
40//! - **No vendor leaks.** The trait has no `redis_url`, no
41//!   `advisory_lock_key`, no `ttl_ms_override_for_etcd`. Configuration
42//!   lives on the concrete plugin's constructor; the trait is one shape
43//!   across memory / Redis / Postgres / etcd.
44//!
45//! # Backends
46//!
47//! Three reference backends ship alongside this trait in `crates/bext-impls/`:
48//!
49//! - `bext-locking-memory` — single-node fallback, `Mutex<HashMap>`.
50//! - `bext-locking-redis` — Redlock-style `SET NX PX` + Lua CAS release.
51//! - `bext-locking-pg` — Postgres `pg_try_advisory_lock(hashtext(key))`.
52//!
53//! # Use by the Scheduled capability
54//!
55//! When a [`crate::scheduled::ScheduledPlugin`] declares a
56//! [`LockingHint::RequireGlobal`] schedule, the host-owned scheduler in
57//! `bext-core::scheduler` acquires a `LockingPlugin` lock keyed on the
58//! schedule id before invoking
59//! [`ScheduledPlugin::run`](crate::scheduled::ScheduledPlugin::run). On
60//! contention (`Ok(None)`) the scheduler skips this tick — another node
61//! is running it. On backend failure (`Err(..)`) the scheduler logs and
62//! skips, matching the existing "lost the lock" semantics.
63//!
64//! [`LockingHint::RequireGlobal`]: crate::scheduled::LockingHint::RequireGlobal
65
66use serde::{Deserialize, Serialize};
67
68/// Handle returned by a successful [`LockingPlugin::try_lock`].
69///
70/// The handle is **opaque** to callers — they pass it back to
71/// [`LockingPlugin::renew`] and [`LockingPlugin::release`] unchanged. The
72/// plugin uses the embedded `lock_id` token to detect stale operations
73/// (see module docs).
74#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
75pub struct LockHandle {
76    /// The key the lock was acquired on. Matches the `key` argument
77    /// passed to [`LockingPlugin::try_lock`]. Useful for logging and
78    /// metrics; the plugin also needs it on release / renew.
79    pub key: String,
80    /// Random token minted by the plugin at acquisition time. Backends
81    /// use it to implement check-and-set release so a stale caller
82    /// cannot release a lock it no longer owns. Format is
83    /// plugin-defined (UUIDv4 is typical) and callers must treat it as
84    /// opaque.
85    pub lock_id: String,
86    /// The TTL the lock was acquired with, in milliseconds. Stored on
87    /// the handle so `renew` can re-apply the same TTL without the
88    /// caller having to remember it.
89    pub ttl_ms: u64,
90}
91
92/// A plugin that provides distributed mutual exclusion.
93///
94/// **Compile-time and WASM execution.** All methods use JSON-friendly
95/// POD types so the trait is ABI-compatible with WASM guest plugins
96/// (matches the convention used by the other capability traits in this
97/// crate). Backends that need a real async client run their own runtime
98/// inside the call.
99///
100/// Concurrency: implementations MUST be safe to call from multiple
101/// threads simultaneously. The host may contend on the same key from
102/// different request-handling threads.
103pub trait LockingPlugin: Send + Sync {
104    /// Unique plugin name (e.g., `"memory"`, `"redis"`, `"pg"`). Used
105    /// by the dev dashboard, metrics labels, and `cap_conformance`.
106    fn name(&self) -> &str;
107
108    /// Attempt to acquire the lock identified by `key` for at most
109    /// `ttl_ms` milliseconds.
110    ///
111    /// # Return shape
112    ///
113    /// - `Ok(Some(handle))` — the lock was acquired. The caller owns
114    ///   it for `ttl_ms` and may renew or release.
115    /// - `Ok(None)` — **contention**. Another owner holds the lock.
116    ///   The caller typically backs off and retries later, or simply
117    ///   skips this run (the Scheduled capability skips).
118    /// - `Err(message)` — backend failure (network, protocol,
119    ///   configuration). The caller cannot meaningfully retry without
120    ///   operator intervention.
121    ///
122    /// The `Ok(None)` vs `Err(..)` distinction is the reason this trait
123    /// does not use a typed `LockError` enum — it is the only branch
124    /// that matters to 99% of callers, and encoding it in the outer
125    /// `Result` is cleaner than a `LockError::Held` variant they would
126    /// have to match every single call site.
127    fn try_lock(&self, key: &str, ttl_ms: u64) -> Result<Option<LockHandle>, String>;
128
129    /// Extend the lifetime of an existing lock.
130    ///
131    /// On success the lock's TTL is reset to `handle.ttl_ms`. On
132    /// failure — either because the lock has already expired or the
133    /// backend is broken — returns `Err(message)`. Callers treat the
134    /// failure uniformly: "you no longer own this lock; stop holding
135    /// it." The default retry policy is to abandon the work and try
136    /// `try_lock` again from scratch on the next opportunity.
137    fn renew(&self, handle: &LockHandle) -> Result<(), String>;
138
139    /// Release the lock.
140    ///
141    /// Implementations MUST verify that the lock is still held under
142    /// the caller's `handle.lock_id` before releasing (check-and-set),
143    /// so a stale call after the TTL expired cannot clobber a newer
144    /// holder. Releasing a lock the caller does not own is a silent
145    /// no-op, not an error.
146    fn release(&self, handle: LockHandle) -> Result<(), String>;
147
148    /// Called before the plugin is unloaded. Release backend resources
149    /// (Redis connection, Postgres pool). Default: no-op.
150    fn cleanup(&self) -> Result<(), String> {
151        Ok(())
152    }
153}
154
155/// Fuel budgets for WASM locking plugin calls. Matches the convention
156/// in [`crate::scheduled::fuel`].
157pub mod fuel {
158    /// Fuel for a single [`super::LockingPlugin::try_lock`] call.
159    pub const TRY_LOCK: u64 = 50_000_000;
160    /// Fuel for a single [`super::LockingPlugin::renew`] call.
161    pub const RENEW: u64 = 50_000_000;
162    /// Fuel for a single [`super::LockingPlugin::release`] call.
163    pub const RELEASE: u64 = 50_000_000;
164    /// Fuel for [`super::LockingPlugin::cleanup`].
165    pub const CLEANUP: u64 = 100_000_000;
166}