diaryx_core 0.11.0

Core library for Diaryx - a tool to manage markdown files with YAML frontmatter
Documentation
---
title: CRDT Synchronization
description: Conflict-free replicated data types for real-time collaboration
part_of: "[README](/crates/diaryx_core/src/README.md)"
audience:
  - developers
attachments:
  - "[mod.rs]/crates/diaryx_core/src/crdt/mod.rs"
  - "[body_doc.rs]/crates/diaryx_core/src/crdt/body_doc.rs"
  - "[body_doc_manager.rs]/crates/diaryx_core/src/crdt/body_doc_manager.rs"
  - "[history.rs]/crates/diaryx_core/src/crdt/history.rs"
  - "[memory_storage.rs]/crates/diaryx_core/src/crdt/memory_storage.rs"
  - "[sqlite_storage.rs]/crates/diaryx_core/src/crdt/sqlite_storage.rs"
  - "[storage.rs]/crates/diaryx_core/src/crdt/storage.rs"
  - "[sync.rs]/crates/diaryx_core/src/crdt/sync.rs"
  - "[sync_client.rs]/crates/diaryx_core/src/crdt/sync_client.rs"
  - "[sync_handler.rs]/crates/diaryx_core/src/crdt/sync_handler.rs"
  - "[sync_manager.rs]/crates/diaryx_core/src/crdt/sync_manager.rs"
  - "[tokio_transport.rs]/crates/diaryx_core/src/crdt/tokio_transport.rs"
  - "[transport.rs]/crates/diaryx_core/src/crdt/transport.rs"
  - "[types.rs]/crates/diaryx_core/src/crdt/types.rs"
  - "[workspace_doc.rs]/crates/diaryx_core/src/crdt/workspace_doc.rs"
exclude:
  - "*.lock"
---

# CRDT Synchronization

This module provides conflict-free replicated data types (CRDTs) for real-time
collaboration, built on [yrs](https://docs.rs/yrs) (the Rust port of Yjs).

## Feature Flags

This module requires the `crdt` feature:

```toml
[dependencies]
diaryx_core = { version = "...", features = ["crdt"] }

# For SQLite-based persistent storage (native only)
diaryx_core = { version = "...", features = ["crdt", "crdt-sqlite"] }

# For native WebSocket sync client (CLI, Tauri)
diaryx_core = { version = "...", features = ["native-sync"] }
```

## Architecture

The CRDT system has several layers, from low-level to high-level:

```text
                    +-----------------+
                    |  SyncProtocol   |  Y-sync for Hocuspocus server
                    +--------+--------+
                             |
          +------------------+------------------+
          |                                     |
+---------v----------+             +-----------v---------+
|   WorkspaceCrdt    |             |    BodyDocManager   |
| (file hierarchy)   |             | (document content)  |
+---------+----------+             +-----------+---------+
          |                                     |
          |              +-------------+        |
          +------------->| CrdtStorage |<-------+
                         +------+------+
                                |
               +----------------+----------------+
               |                                 |
      +--------v--------+              +---------v--------+
      |  MemoryStorage  |              |  SqliteStorage   |
      +-----------------+              +------------------+
```

1. **Types** (`types.rs`): Core data structures like `FileMetadata` and `BinaryRef`
2. **Storage** (`storage.rs`): `CrdtStorage` trait for persisting CRDT state
3. **WorkspaceCrdt** (`workspace_doc.rs`): Y.Doc for workspace file hierarchy
4. **BodyDoc** (`body_doc.rs`): Per-file Y.Doc for document content
5. **BodyDocManager** (`body_doc_manager.rs`): Manages multiple BodyDocs
6. **SyncProtocol** (`sync.rs`): Y-sync protocol for Hocuspocus server
7. **HistoryManager** (`history.rs`): Version history and time travel

## Frontmatter timestamps

When converting frontmatter to `FileMetadata`, the `updated` property is parsed
as either a numeric timestamp (milliseconds) or an RFC3339/ISO8601 string, and
mapped to `modified_at`. When writing frontmatter back to disk, `updated` is
emitted as an RFC3339 string for readability.

## WorkspaceCrdt

Manages the workspace file hierarchy as a CRDT. Files are keyed by stable
document IDs (UUIDs), making renames and moves trivial property updates.

### Doc-ID Based Architecture

```rust,ignore
use diaryx_core::crdt::{WorkspaceCrdt, MemoryStorage, FileMetadata};
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
let workspace = WorkspaceCrdt::new(storage);

// Create a file with auto-generated UUID
let metadata = FileMetadata::with_filename(
    "my-note.md".to_string(),
    Some("My Note".to_string())
);
let doc_id = workspace.create_file(metadata).unwrap();

// Derive filesystem path from doc_id (walks parent chain)
let path = workspace.get_path(&doc_id); // Some("my-note.md")

// Find doc_id by path
let found_id = workspace.find_by_path(Path::new("my-note.md"));

// Renames and moves are trivial - doc_id is stable!
workspace.rename_file(&doc_id, "new-name.md").unwrap();
workspace.move_file(&doc_id, Some(&parent_doc_id)).unwrap();
```

### Legacy Path-Based API

For backward compatibility, path-based operations are still supported:

```rust,ignore
workspace.set_file("notes/my-note.md", metadata);
let meta = workspace.get_file("notes/my-note.md");
workspace.remove_file("notes/my-note.md");
```

### Migration

Workspaces using the legacy path-based format can be migrated:

```rust,ignore
if workspace.needs_migration() {
    let count = workspace.migrate_to_doc_ids().unwrap();
    println!("Migrated {} files", count);
}
```

## BodyDoc

Manages individual document content with collaborative editing support:

```rust,ignore
use diaryx_core::crdt::{BodyDoc, MemoryStorage};
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
let doc = BodyDoc::new("notes/my-note.md", storage);

// Body content operations
doc.set_body("# Hello World\n\nThis is my note.");
let content = doc.get_body();

// Collaborative editing
doc.insert_at(0, "Prefix: ");
doc.delete_range(0, 8);

// Frontmatter operations
doc.set_frontmatter("title", "My Note");
let title = doc.get_frontmatter("title");
doc.remove_frontmatter("audience");
```

Body sync observer registration and per-update logs are emitted at trace level
to avoid log spam during large downloads. Enable trace logging only when
diagnosing body sync issues.

BodyDoc sync observers are registered once per document. Repeated calls to
`set_sync_callback` for the same doc are ignored to avoid duplicate observers
and unnecessary overhead during bulk downloads.

## BodyDocManager

Manages multiple BodyDocs with lazy loading:

```rust,ignore
use diaryx_core::crdt::{BodyDocManager, MemoryStorage};
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
let manager = BodyDocManager::new(storage);

// Get or create a BodyDoc for a file
let doc = manager.get_or_create("notes/my-note.md");
doc.set_body("Content here");

// Check if a doc exists
if manager.has_doc("notes/my-note.md") {
    // ...
}

// Remove a doc from the manager
manager.remove_doc("notes/my-note.md");
```

## Sync Protocol

The sync module implements Y-sync protocol for real-time collaboration with
Hocuspocus or other Y.js-compatible servers:

```rust,ignore
use diaryx_core::crdt::{WorkspaceCrdt, MemoryStorage};
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
let workspace = WorkspaceCrdt::new("workspace", storage);

// Get sync state for initial handshake
let state_vector = workspace.get_sync_state();

// Apply remote update from server
let remote_update: Vec<u8> = /* from WebSocket */;
workspace.apply_update(&remote_update);

// Encode state for sending to server
let full_state = workspace.encode_state();

// Encode incremental update since a state vector
let diff = workspace.encode_state_as_update(&remote_state_vector);
```

## Version History

All local changes are automatically recorded, enabling version history and
time travel:

```rust,ignore
use diaryx_core::crdt::{WorkspaceCrdt, MemoryStorage, HistoryEntry};
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
let workspace = WorkspaceCrdt::new("workspace", storage.clone());

// Make some changes
workspace.set_file("file1.md", metadata1);
workspace.set_file("file2.md", metadata2);

// Get version history
let history: Vec<HistoryEntry> = storage.get_all_updates("workspace").unwrap();
for entry in &history {
    println!("Version {} at {:?}: {} bytes",
             entry.version, entry.timestamp, entry.update.len());
}

// Time travel to a specific version
workspace.restore_to_version(1);
```

## Storage Backends

### MemoryStorage

In-memory storage for WASM/web and testing:

```rust,ignore
use diaryx_core::crdt::MemoryStorage;
use std::sync::Arc;

let storage = Arc::new(MemoryStorage::new());
```

### SqliteStorage

Persistent storage using SQLite (requires `crdt-sqlite` feature, native only):

```rust,ignore
use diaryx_core::crdt::SqliteStorage;
use std::sync::Arc;

let storage = Arc::new(SqliteStorage::open("crdt.db").unwrap());
```

## Integration with Command API

CRDT operations are available through the unified command API for WASM/Tauri:

```rust,ignore
use diaryx_core::{Diaryx, Command, CommandResult};

let diaryx = Diaryx::with_crdt(fs, crdt_storage);

// Execute CRDT commands
let result = diaryx.execute(Command::GetSyncState {
    doc_type: "workspace".to_string(),
    doc_name: None,
});

let result = diaryx.execute(Command::SetFileMetadata {
    path: "notes/my-note.md".to_string(),
    metadata: file_metadata,
});

let result = diaryx.execute(Command::GetHistory {
    doc_type: "workspace".to_string(),
    doc_name: None,
});
```

## Unified Sync Client

The `SyncClient` provides a unified interface for WebSocket-based real-time sync
across all platforms. It uses a `SyncTransport` trait for platform abstraction:

```text
┌────────────────────┐    ┌────────────────────┐
│ TokioTransport     │    │ CallbackTransport  │
│ (tokio-tungstenite)│    │ (JS WebSocket)     │
│ #[cfg(native)]     │    │ #[cfg(wasm32)]     │
└─────────┬──────────┘    └─────────┬──────────┘
          │                         │
          └────────────┬────────────┘
          ┌──────────────────────┐
          │   SyncClient<T>      │
          │   - Reconnection     │
          │   - Dual connections │
          │   - Message routing  │
          └──────────────────────┘
          ┌──────────────────────┐
          │   RustSyncManager    │
          └──────────────────────┘
```

The `SyncManager` filters metadata-echo updates before returning
`changed_files`, so `FilesChanged` events are suppressed for no-op metadata
syncs that would otherwise trigger unnecessary UI refreshes.

### Native (CLI/Tauri)

Use `TokioTransport` for native WebSocket connections (requires `native-sync` feature):

```rust,ignore
use diaryx_core::crdt::{SyncClient, SyncClientConfig, TokioTransport};

let transport_meta = TokioTransport::new();
let transport_body = TokioTransport::new();

let config = SyncClientConfig::new(
    "wss://sync.example.com/sync".to_string(),
    "workspace-id".to_string(),
    PathBuf::from("/path/to/workspace"),
).with_auth("token".to_string());

let client = SyncClient::new(config, transport_meta, transport_body, sync_manager);
client.start().await?;
```

### WASM (Web)

For WASM, use `CallbackTransport` which routes messages through JavaScript:

```rust,ignore
// In diaryx_wasm
use CallbackTransport;

let transport = CallbackTransport::new();
// Messages are injected via inject_sync_message() from JS
// Outgoing messages are polled via poll_outgoing_messages()
```

## Relationship to Cloud Sync

The CRDT module handles **real-time collaboration** (character-by-character edits),
while the [`sync`](../sync/README.md) module handles **file-level cloud sync**
(S3, Google Drive). They work together:

- CRDT tracks fine-grained changes within documents
- Cloud sync uploads/downloads whole files to/from storage providers
- Both use the same `WorkspaceCrdt` metadata for consistency