Module loro_store

Expand description

Server-side per-row LoroDoc cache with snapshot persistence.

For CRDT-backed entities (crdt: true in the manifest, the default), every row corresponds to one LoroDoc. This store owns those docs in memory, hydrates them on demand from a sidecar SQLite table, write-throughs every commit, and projects the doc state into the JSON shape Pylon’s existing storage layer expects.

§Persistence shape

Single sidecar table:

CREATE TABLE _pylon_crdt_snapshots (
    entity     TEXT NOT NULL,
    row_id     TEXT NOT NULL,
    snapshot   BLOB NOT NULL,
    updated_at TEXT NOT NULL,
    PRIMARY KEY (entity, row_id)
);

Snapshots are full-state Loro snapshots (ExportMode::Snapshot). Loro applies internal compaction so the snapshot size stays bounded; we don’t track an op log separately.

§In-memory cache

Active rows live in a HashMap<(entity, row_id), Arc<Mutex<LoroDoc>>>. First access for a row hydrates the doc from the sidecar (or creates a fresh one). Subsequent accesses reuse the in-memory doc — required both for correctness (Loro’s CRDT identity is per-doc-instance) and perf (snapshot decode is ~100µs per row).

No eviction yet. Working sets up to ~100K active rows are fine on commodity hardware (~5-50 MB). For larger working sets a follow-up adds LRU eviction with snapshot reload on next access.

§Bandwidth: full snapshot per write (TODO)

Every CRDT-mode write triggers a binary WS broadcast carrying the row’s full current snapshot, not just the incremental update. Loro’s compaction bounds individual snapshots, but the per-write cost still scales with total state size, not write size.

Concrete numbers:

Workload	Snapshot/row	Per-write fanout
Chat message	~200 B	tiny
Boring CRUD record	~500 B	tiny
Whiteboard with 1k strokes	~30 KB	uncomfortable
Document with 50K-char body	~80 KB	bad

Multiply by connected_clients × writes_per_second to get total broadcast bandwidth. For chat-shaped workloads it’s free. For collab whiteboards / large documents it bites once you pass ~10 connected clients on a hot row.

§Switching to incremental updates

Loro already supports export(ExportMode::updates(version_vector)) returning only the ops a peer hasn’t seen — the building block is there. What’s missing is the per-client tracking:

Subscribe protocol — clients tell the server “I want updates for rows X, Y, Z” instead of every CRDT write fanning out to every client. Pylon’s existing room layer is the natural transport once room semantics extend to per-row subscriptions.
Server-side state — (client_id, entity, row_id) → version_vector so the server knows what each client is missing. Bounded by the subscribe set; LRU-evicted with the doc cache.
Encoder swap — notify_crdt calls encode_update_since(vv) instead of encode_snapshot() and ships frame type 0x11 (CRDT_FRAME_UPDATE) instead of 0x10 (CRDT_FRAME_SNAPSHOT). Wire format already reserves both bytes.
New-subscriber bootstrap — first frame is still a snapshot (0x10), subsequent frames are deltas (0x11).

Estimated effort: ~2 days for a working slice plus a week of production hardening (correct VV tracking under reconnects, garbage-collecting subscriptions on disconnect, handling missed frames via resync request).

Until then this implementation is fine for chat / boring CRUD / demo workloads. Don’t run a Figma clone on it.

Structs§

LoroStore: Server-side per-row LoroDoc cache + persistence layer.

Enums§

LoroStoreError

Constants§

CREATE_SIDECAR_SQL: SQL to create the snapshot sidecar. Idempotent. Called by Runtime constructor for any database where CRDT mode could be in use (always, since crdt: true is the default).

Functions§

ensure_sidecar: Create the sidecar table. Safe to call repeatedly.