Skip to main content

Module federation

Module federation 

Source
Expand description

Federation autonomy — wires the quorum primitives from replication into the HTTP write path (v0.7 track C, PR 2 of N).

§Contract

When the ai-memory serve daemon is started with --quorum-writes N and --quorum-peers <url1,url2,…>, every successful HTTP write fans out a 1-memory /api/v1/sync/push POST to each peer and counts 2xx responses as acks. The write returns OK to the HTTP caller only once the local commit plus W - 1 peer acks land within the --quorum-timeout-ms deadline. Fewer acks → 503 with body {"error":"quorum_not_met", "got":X, "needed":Y, "reason":…}.

§Scope of this module

  • FederationConfig — the serve-time config parsed from CLI flags.
  • broadcast_store_quorum — async HTTP fan-out that builds an AckTracker from replication::QuorumPolicy, spawns one task per peer, and waits on either quorum-met or deadline.
  • Mock-peer integration tests covering the happy path, a dropped ack pattern, and a total outage.

§NOT in scope of this module

  • The real multi-process chaos harness lives under packaging/chaos/ as an operator-facing shell script. A campaign report is produced by packaging/chaos/run-chaos.sh — see that file for how to measure the convergence bound committed to in ADR-0001.
  • MCP-over-stdio and CLI writes do NOT fan out to peers. The MCP server is a single-tenant stdio client and the CLI is local; both rely on the sync-daemon for eventual propagation. Only the HTTP daemon is a federation node.

Structs§

FederationConfig
Configured-at-serve federation state. Parsed from --quorum-writes + --quorum-peers + --quorum-timeout-ms.
PeerEndpoint
A single peer in the quorum mesh. The id is what we record in the ack tracker (typically the URL or the peer’s mTLS fingerprint).
QuorumNotMetPayload
Serialised 503 payload for failed quorum writes.

Functions§

broadcast_archive_quorum
v0.6.2 (S29): fan out a just-archived memory id to every peer. Payload rides on sync_push via archives: [id], mirroring the shape used by broadcast_delete_quorum for deletions. On the receiving peer, sync_push calls db::archive_memory to move the row into archived_memories — unlike the delete path this is a soft removal (the row remains queryable via /api/v1/archive).
broadcast_consolidate_quorum
v0.6.2 (#326): fan out a consolidation in a single sync_push — the new consolidated memory + the source ids being deleted. Mirrors the local semantics of db::consolidate (insert new + delete sources) so peers end up in the same terminal state as the originator.
broadcast_delete_quorum
Fan out a tombstone for id to every configured peer via the extended sync_push body (deletions: [id]). Same quorum contract as broadcast_store_quorum: local delete is recorded immediately, peer acks counted against policy.write_quorum, deadline enforced, stragglers detached.
broadcast_link_quorum
v0.6.2 (#325): fan out a just-committed memory link to every peer. Payload rides on sync_push via links: [link]. Same quorum contract as broadcast_store_quorum.
broadcast_namespace_meta_clear_quorum
v0.6.2 (S35 follow-up): fan out a namespace-standard clear to peers via sync_push.namespace_meta_clears. PR #363 shipped set-side fanout via broadcast_namespace_meta_quorum but left the clear path local-only — alice clearing on node-1 didn’t propagate to bob on node-2, so the scenario-35 cross-peer clear assertion failed.
broadcast_namespace_meta_quorum
v0.6.2 (S35): fan out a namespace_meta row (the (namespace, standard_id, parent_namespace) tuple set by set_namespace_standard) to peers via sync_push.namespace_meta. Without this, peers see the standard memory (already fanned out via broadcast_store_quorum) but not the meta row tying it to a namespace + parent — so the parent-chain walk on the peer falls through to auto_detect_parent and can return a different ancestor than the originator.
broadcast_pending_decision_quorum
v0.6.2 (S34): fan out a pending-action decision (approve/reject) to peers via sync_push.pending_decisions. Without this, an approve on node-2 leaves the row in status='pending' on node-1 and the caller sees inconsistent governance state across the cluster. Peers apply via db::decide_pending_action which is a no-op on already-decided rows — replay-safe.
broadcast_pending_quorum
v0.6.2 (S34): fan out a just-created pending-action row to every peer via sync_push.pendings. Callers pass the fully-hydrated PendingAction read from their local pending_actions table so peers can upsert it with the same id / status / approvals tuple the originator has. Mirrors the quorum semantics of broadcast_store_quorum — local pending row is already persisted at call time; peer acks are counted against policy.write_quorum.
broadcast_restore_quorum
v0.6.2 (S29): fan out a just-restored memory id to every peer. Payload rides on sync_push via restores: [id], mirroring the shape used by broadcast_archive_quorum. On the receiving peer, sync_push moves the row from archived_memories back into memories via db::restore_archived. If the peer never saw the archive or the row isn’t in its archive table, the sync call no-ops (same missing-on-peer posture used for archives and deletions).
broadcast_store_quorum
Fan out a just-committed memory to every configured peer. Returns an AckTracker whose finalise() you then call against the deadline to get the quorum outcome.
bulk_catchup_push
v0.6.2 Patch 2 (S40): post-fanout catchup for bulk_create.
finalise_quorum
Classify an AckTracker into either a committed quorum (Ok(n)) or an error with a reason suitable for the /503 quorum_not_met payload. Consumes the tracker — call after the broadcast loop.
spawn_catchup_loop
v0.6.0.1 (#320) — post-partition catchup poller.