silk-graph 0.2.0

# Why Silk Exists

## The Problem

Knowledge graphs today force a choice:

**Graph libraries** (NetworkX, igraph) give you algorithms and analysis — but no sync, no schema, no persistence. They're tools for working with graph data that's already in memory, not for distributing or enforcing it.

**Centralized graph databases** (Neo4j, TerminusDB, Amazon Neptune) give you schema, queries, and version history — but require a server. Offline operation is not a concept. Sync requires a coordinator (git-style push/pull for TerminusDB, enterprise replication for Neo4j). Conflicts must be resolved manually.

**Distributed CRDTs** (Automerge, Yjs/pycrdt, Loro) give you offline-first, conflict-free replication — but they are document-oriented. No schema enforcement at write time. No graph traversal primitives. You bolt on your own validation, your own BFS, your own impact analysis. The CRDT handles merge; everything else is your responsibility.

**No tool combines all six:**
1. Distributed (no server required)
2. Schema-enforced (validated at write time)
3. Graph-native (traversal, algorithms, pattern matching built in)
4. Conflict-free (mathematical convergence guarantee)
5. Provenance (ed25519 signatures — every entry is cryptographically signed)
6. Time-travel (query the graph at any historical time)

Silk is that tool.

## What Silk Does Differently

### Schema-enforced

Silk validates every write against an ontology defined at store creation. Node types, edge types, source/target constraints, required properties, property types — all checked before the entry hits the DAG. Invalid writes are rejected, not silently stored. The ontology defines the minimum (required properties, type constraints); unknown properties are accepted (open-world, D-026). Your schema evolves without migrations. Plus, the ontology can grow at runtime — `extend_ontology()` (R-03) adds new types, properties, and subtypes without store recreation.

### Conflict-free

Silk uses a Merkle-CRDT: every mutation is a content-addressed entry in a DAG, with hybrid logical clocks (R-01) for real-time causal ordering. Concurrent writes to different properties on the same node both survive — per-property last-writer-wins. Concurrent add + remove → add wins. Two stores that exchange sync messages in both directions are mathematically guaranteed to converge to the same graph state. No coordinator, no consensus protocol, no leader election.

### Offline-first

Every Silk instance is a self-contained graph database. Reads and writes work with zero network connectivity. When connectivity returns, a single sync round-trip (offer → payload → merge) brings both sides to convergence. There is no "primary" — any instance can sync with any other. The sync protocol uses Bloom filters to minimize data transfer, sending only entries the peer lacks.

### Graph-native

BFS, shortest path, impact analysis, subgraph extraction, pattern matching, topological sort, cycle detection — built into the engine, operating on the materialized graph. Not a layer on top. Not a query language that compiles to table scans. Graph structure is a first-class citizen of the storage and query model.

### Provenance

ed25519 signatures on every entry. You can verify who created each piece of data. Trust registries control which peers are accepted. Strict mode rejects unsigned entries on merge. No external PKI required — keys are generated locally and exchanged out of band (same trust model as the ontology itself).

### Time-travel

Query the graph at any point in the past with `store.as_of(physical_ms)`. Every entry carries a hybrid logical clock (R-01) — real wall-clock time, not just a counter. "What did we know yesterday at 3pm?" is a meaningful question with a precise answer. The entire oplog is the audit trail; `as_of()` materializes any prefix of it. No other CRDT-based graph engine offers this.

## Proof by Example

Four example scripts demonstrate these properties with real code, real measurements, and assertions that verify correctness.

### `examples/offline_first.py` — Two-peer offline sync

Two devices each write 500 nodes independently (simulating offline operation). A single bidirectional sync merges them to 1,000 nodes on each side. Verifies identical node sets. Measures sync latency.

**What it proves:** Offline writes accumulate without conflict. Sync is a single round-trip. No server involved.

### `examples/partition_heal.py` — Three-peer partition healing

Three peers start synced, then diverge (200 nodes each, all different). After healing via mesh sync (A-B, B-C, A-C), all three converge to the same 600-node graph.

**What it proves:** Network partitions are a non-event. Diverged state merges cleanly. Convergence is independent of partition duration.

### `examples/concurrent_writes.py` — Per-property LWW

Two stores modify the same node concurrently. Store A updates `status`. Store B updates `status` AND adds `location`. After sync, both stores have the LWW winner for `status` and the non-conflicting `location` property. No data loss.

**What it proves:** Conflicts are resolved per-property, not per-node. Non-conflicting concurrent writes are never discarded.

### `examples/ring_topology.py` — Zero-coordination scale

Ten peers in a ring topology. Each writes 100 nodes. Ring sync propagates data around the ring until all 10 peers have all 1,000 nodes. Reports how many rounds were needed.

**What it proves:** Convergence works with arbitrary topologies. No peer is special. No election, no leader, no coordinator.

## Benchmarks

Measured on Apple M4 Max, macOS 15.7, Rust 1.94.0. Full details: [BENCHMARKS.md](BENCHMARKS.md).

### Silk self-benchmarks (Criterion.rs)

| What | 100 nodes | 1,000 nodes | 10,000 nodes |
|------|-----------|-------------|--------------|
| Write + materialize | 129 µs | 1.5 ms | 16.8 ms |
| Sync offer generation | 24 µs | 282 µs | 3.3 ms |
| Full sync (zero overlap) | 111 µs | 1.3 ms | — |
| Incremental sync (10% delta) | — | 611 µs | — |
| Partition heal (500/side) | — | 833 µs | — |
| BFS traversal | — | 564 ns | 580 ns |
| Shortest path | — | 706 ns | 717 ns |

### Comparative benchmarks (vs Loro, pycrdt)

Silk compared against two document CRDTs (Loro 1.10.3, pycrdt 0.12.50) on shared CRDT operations. All three are Rust cores with PyO3 bindings. Results from 2026-03-27, silk-graph v0.1.6. Verified across 3 local + 3 Docker runs.

| Scenario | Silk | Loro | pycrdt |
|----------|------|------|--------|
| Write 1K entities | 4.3 ms (233K ops/s) | 2.4 ms (417K ops/s) | 3.8 ms (263K ops/s) |
| Update 1K fields | 1.75 ms (571K ops/s) | 0.75 ms (1.33M ops/s) | 2.8 ms (357K ops/s) |
| Sync 500 entities | 10.9 ms | 4.7 ms | 7.1 ms |
| Sync bandwidth (500 entities) | 175 KB | 25 KB | 36 KB |
| Structured workload (1000 users + 200 projects) | 12.5 ms | 7.0 ms | 2,711 ms |
| 10-peer ring convergence (500 entities) | 119 ms | 10.5 ms | 175 ms |
| Partition heal (1000 shared + 500 divergent) | 18.6 ms | 3.3 ms | 9.3 ms |
| Merge correctness | 100% | 100% | 100% |

Silk is slower on raw CRDT operations and uses more bandwidth (7x Loro). This is the cost of content-addressed Merkle-DAG entries — each operation carries a BLAKE3 hash, HLC clock, author identity, and causal parent links. These provide integrity verification, causal ordering, immutable audit trail, and author authentication — capabilities the comparison systems do not offer.

For context: 10,000 entities written in **50ms**. A 500-server infrastructure graph synced between two peers in **11ms**. A 1,500-entity partition healed in **18.6ms**. For local-first systems syncing on a timer, these are within practical bounds.

Full methodology, per-scenario analysis, and reproduction instructions: [BENCHMARKS.md](BENCHMARKS.md).

## When Silk Is the Right Tool

Silk is a replicated graph store that validates your schema, works offline, and fits in your process. No server, no coordinator, no external database.

**Good fit:**
- **Local-first applications** — offline-capable apps that sync when connected (note-taking, task management, field data collection)
- **Edge computing** — devices that operate independently and sync periodically (IoT gateways, drones, retail POS)
- **Multi-device sync** — phone, laptop, server — all converge automatically
- **Peer-to-peer systems** — no central server, any node can sync with any other
- **Knowledge graphs with schema** — when you need both structure enforcement and graph traversal
- **Audit trails** — every change is an immutable, content-addressed entry in a Merkle-DAG

**Not the right tool:**
- **High-throughput analytics** — DuckDB, ClickHouse
- **SQL queries** — SQLite, Postgres
- **Document storage** — MongoDB, CouchDB
- **Blob storage** — S3
- **Streaming data** — Kafka, Redpanda

## Architecture

```
Write (add_node, add_edge, update_property, remove_node, remove_edge)
  |
  v
Ontology Validation
  |  reject invalid writes here, before they enter the DAG
  v
Entry { hash: BLAKE3(content), op, clock: HLC, author, parents: [head_hashes] }
  |
  v
OpLog (append-only Merkle-DAG, content-addressed, immutable)
  |
  |--- MaterializedGraph (live view)
  |      |
  |      |-- Nodes: id -> { type, label, subtype, properties }
  |      |-- Edges: id -> { type, source, target, properties }
  |      |-- Indexes: by_type, by_subtype, by_property
  |      |
  |      +-- Query / Algorithms
  |            bfs, shortest_path, impact_analysis, pattern_match,
  |            topological_sort, has_cycle, subgraph
  |
  +--- Sync Protocol
         |
         |-- generate_sync_offer()   -> Bloom filter of known hashes
         |-- receive_sync_offer()    -> entries the peer is missing
         +-- merge_sync_payload()    -> apply remote entries, re-materialize
                                        (per-property LWW, add-wins semantics)
```

**Convergence invariant:** For any two stores S1 and S2, if S1 and S2 have exchanged sync messages in both directions, then `S1.all_nodes() == S2.all_nodes()` and `S1.all_edges() == S2.all_edges()`. This holds regardless of write order, network topology, or partition duration.