Expand description
Federation autonomy — wires the quorum primitives from replication
into the HTTP write path (v0.7 track C, PR 2 of N).
§Contract
When the ai-memory serve daemon is started with --quorum-writes N
and --quorum-peers <url1,url2,…>, every successful HTTP write
fans out a 1-memory /api/v1/sync/push POST to each peer and counts
2xx responses as acks. The write returns OK to the HTTP caller only
once the local commit plus W - 1 peer acks land within the
--quorum-timeout-ms deadline. Fewer acks → 503 with body
{"error":"quorum_not_met", "got":X, "needed":Y, "reason":…}.
§Scope of this module
FederationConfig— the serve-time config parsed from CLI flags.broadcast_store_quorum— async HTTP fan-out that builds anAckTrackerfromreplication::QuorumPolicy, spawns one task per peer, and waits on either quorum-met or deadline.- Mock-peer integration tests covering the happy path, a dropped ack pattern, and a total outage.
§NOT in scope of this module
- The real multi-process chaos harness lives under
packaging/chaos/as an operator-facing shell script. A campaign report is produced bypackaging/chaos/run-chaos.sh— see that file for how to measure the convergence bound committed to in ADR-0001. - MCP-over-stdio and CLI writes do NOT fan out to peers. The MCP server is a single-tenant stdio client and the CLI is local; both rely on the sync-daemon for eventual propagation. Only the HTTP daemon is a federation node.
Structs§
- Federation
Config - Configured-at-serve federation state. Parsed from
--quorum-writes+--quorum-peers+--quorum-timeout-ms. - Peer
Endpoint - A single peer in the quorum mesh. The
idis what we record in the ack tracker (typically the URL or the peer’s mTLS fingerprint). - Quorum
NotMet Payload - Serialised 503 payload for failed quorum writes.
Functions§
- broadcast_
archive_ quorum - v0.6.2 (S29): fan out a just-archived memory id to every peer. Payload
rides on
sync_pushviaarchives: [id], mirroring the shape used bybroadcast_delete_quorumfor deletions. On the receiving peer,sync_pushcallsdb::archive_memoryto move the row intoarchived_memories— unlike the delete path this is a soft removal (the row remains queryable via/api/v1/archive). - broadcast_
consolidate_ quorum - v0.6.2 (#326): fan out a consolidation in a single
sync_push— the new consolidated memory + the source ids being deleted. Mirrors the local semantics ofdb::consolidate(insert new + delete sources) so peers end up in the same terminal state as the originator. - broadcast_
delete_ quorum - Fan out a tombstone for
idto every configured peer via the extendedsync_pushbody (deletions: [id]). Same quorum contract asbroadcast_store_quorum: local delete is recorded immediately, peer acks counted againstpolicy.write_quorum, deadline enforced, stragglers detached. - broadcast_
link_ quorum - v0.6.2 (#325): fan out a just-committed memory link to every peer.
Payload rides on
sync_pushvialinks: [link]. Same quorum contract asbroadcast_store_quorum. - broadcast_
namespace_ meta_ clear_ quorum - v0.6.2 (S35 follow-up): fan out a namespace-standard clear to peers
via
sync_push.namespace_meta_clears. PR #363 shipped set-side fanout viabroadcast_namespace_meta_quorumbut left the clear path local-only — alice clearing on node-1 didn’t propagate to bob on node-2, so the scenario-35 cross-peer clear assertion failed. - broadcast_
namespace_ meta_ quorum - v0.6.2 (S35): fan out a
namespace_metarow (the(namespace, standard_id, parent_namespace)tuple set byset_namespace_standard) to peers viasync_push.namespace_meta. Without this, peers see the standard memory (already fanned out viabroadcast_store_quorum) but not the meta row tying it to a namespace + parent — so the parent-chain walk on the peer falls through toauto_detect_parentand can return a different ancestor than the originator. - broadcast_
pending_ decision_ quorum - v0.6.2 (S34): fan out a pending-action decision (approve/reject) to
peers via
sync_push.pending_decisions. Without this, an approve on node-2 leaves the row instatus='pending'on node-1 and the caller sees inconsistent governance state across the cluster. Peers apply viadb::decide_pending_actionwhich is a no-op on already-decided rows — replay-safe. - broadcast_
pending_ quorum - v0.6.2 (S34): fan out a just-created pending-action row to every peer
via
sync_push.pendings. Callers pass the fully-hydratedPendingActionread from their localpending_actionstable so peers can upsert it with the same id / status / approvals tuple the originator has. Mirrors the quorum semantics ofbroadcast_store_quorum— local pending row is already persisted at call time; peer acks are counted againstpolicy.write_quorum. - broadcast_
restore_ quorum - v0.6.2 (S29): fan out a just-restored memory id to every peer. Payload
rides on
sync_pushviarestores: [id], mirroring the shape used bybroadcast_archive_quorum. On the receiving peer,sync_pushmoves the row fromarchived_memoriesback intomemoriesviadb::restore_archived. If the peer never saw the archive or the row isn’t in its archive table, the sync call no-ops (same missing-on-peer posture used for archives and deletions). - broadcast_
store_ quorum - Fan out a just-committed memory to every configured peer. Returns
an
AckTrackerwhosefinalise()you then call against the deadline to get the quorum outcome. - bulk_
catchup_ push - v0.6.2 Patch 2 (S40): post-fanout catchup for
bulk_create. - finalise_
quorum - Classify an
AckTrackerinto either a committed quorum (Ok(n)) or an error with a reason suitable for the/503 quorum_not_metpayload. Consumes the tracker — call after the broadcast loop. - spawn_
catchup_ loop - v0.6.0.1 (#320) — post-partition catchup poller.