Silk
A Merkle-CRDT graph engine for distributed, conflict-free knowledge graphs.
Silk is an embedded graph database with automatic conflict resolution. Built on Merkle-DAGs and CRDTs, it requires no leader, no consensus protocol, and no coordinator. Any two Silk instances that exchange sync messages are guaranteed to converge to the same graph state. Schema is enforced at write time via an ontology — not at query time.
Quick Start
Python
# Define your schema
=
# Create two independent stores (imagine different machines)
=
=
# Write to store A
# Write to store B (concurrently, no coordination)
# Sync: A sends to B
=
=
# Sync: B sends to A
=
=
# Both stores now have Alice, Bob, Acme, and the WORKS_AT edge
assert is not None
assert is not None
assert is not None
assert is not None
What just happened
The code above created two independent graph stores, wrote data to each, and synced them — both now hold the same graph. No server, no coordinator, no conflict resolution code. Here's what it looks like:
graph TB
subgraph "Before Sync"
direction LR
subgraph "Store A"
A1["alice (person)"]
A2["acme (company)"]
A1 -->|WORKS_AT| A2
end
subgraph "Store B"
B1["bob (person)"]
end
end
subgraph "After Sync — both stores identical"
direction LR
subgraph "Store A '"
A3["alice (person)"]
A4["acme (company)"]
A5["bob (person)"]
A3 -->|WORKS_AT| A4
end
subgraph "Store B '"
B3["alice (person)"]
B4["acme (company)"]
B5["bob (person)"]
B3 -->|WORKS_AT| B4
end
end
Under the hood, every write becomes a content-addressed entry in a Merkle-DAG. Sync exchanges only the entries the other side is missing:
flowchart LR
W["add_node(...)"] --> E["Entry\n{hash, op, clock, author}"]
E --> O["OpLog\n(Merkle-DAG)"]
O --> G["Materialized\nGraph"]
O --> S["Sync Protocol"]
S <-->|"offer ⇄ payload"| P["Remote Peer"]
P --> O2["Peer OpLog"]
O2 --> G2["Peer Graph"]
Rust
use ;
let ontology = from_json?;
let mut store = new;
store.add_node?;
store.add_edge?;
// Sync with a peer
let offer = store.generate_sync_offer;
let payload = peer.receive_sync_offer?;
store.merge_sync_payload?;
Features
- Ontology-enforced schema — define node types, edge types, and their properties. Silk validates at write time. Invalid entries from sync are quarantined (R-02) — accepted into the oplog for CRDT convergence but invisible in the materialized graph. Unknown properties and subtypes are accepted (D-026: open properties) — the ontology defines the minimum, not the maximum.
- Content-addressed entries — every mutation is a BLAKE3-hashed entry in a Merkle-DAG. Entries are immutable. The DAG is the audit trail.
- Per-property last-writer-wins — two concurrent writes to different properties on the same node both succeed. No data loss from non-conflicting edits.
- Delta-state sync — Bloom filter optimization minimizes data transfer. Only entries the peer doesn't have are sent.
- Graph algorithms — BFS, shortest path, impact analysis, pattern matching, topological sort, cycle detection. Built into the engine, not bolted on.
- Persistent storage — backed by redb (embedded, transactional, pure Rust). In-memory mode also available.
- Real-time subscriptions — register callbacks that fire on every graph mutation (local or merged from sync).
- Observation log — append-only, TTL-pruned time-series store for metrics alongside the graph. Same redb backend.
- Zero runtime dependencies — no Postgres, no Redis, no network required. Silk is a library, not a service.
- Author authentication — ed25519 signatures on every entry. Auto-sign on write, verify on merge. Trust registry for known peers. Strict mode rejects unsigned entries. (D-027)
- Evolvable schema — extend the ontology at runtime with new types, properties, and subtypes via
extend_ontology(). Only additive changes — no migrations, no store recreation. (R-03) - Scalable sync — gossip-based peer selection (R-05). Instead of syncing with all N peers, select ceil(ln(N)+1) random targets per round. Scales from 2 peers to 10,000+.
- Time-travel queries —
store.as_of(physical_ms)returns a read-onlyGraphSnapshotat any historical time. All query and algorithm methods available. Datomic-style "the database is a value." (R-06) - Epoch compaction —
store.compact()compresses the entire oplog into a single checkpoint entry. Bounds memory and disk growth for long-running systems. (R-08)
When to Use Silk
Good fit:
- Local-first applications (offline-capable, sync when connected)
- Edge computing (devices that operate independently, sync periodically)
- Peer-to-peer systems (no central server, any node can sync with any other)
- Knowledge graphs with schema enforcement
- Multi-device sync (phone, laptop, server — all converge)
- Systems that need an audit trail (every change is a Merkle-DAG entry)
- Systems with evolving schemas (extend ontology without migrations — R-03)
- Systems that need historical queries (time-travel to any point in the past — R-06)
Not the right tool:
- High-throughput analytics — use DuckDB or ClickHouse
- SQL queries — use SQLite or Postgres
- Document storage — use MongoDB or CouchDB
- Blob storage — use S3
Trust Model
Silk is designed for trusted peer networks — your own devices, your own team, your own infrastructure. All peers share the same genesis ontology and can extend it monotonically (R-03). Peers are assumed non-malicious.
What Silk provides today:
- Schema enforcement on all code paths (including sync, since v0.1.1)
- Clock overflow protection (saturating arithmetic)
- Clock drift rejection (entries with implausibly far-future clocks are rejected)
- Message size limits on sync payloads
- Hash integrity verification on every entry
- Author authentication — ed25519 signatures via
generate_signing_key(),register_trusted_author(),set_require_signatures(). Enable strict mode to reject unsigned entries. Key revocation not yet supported.
What Silk provides today (cont.):
- Oplog compaction —
store.compact()(R-08) compresses the oplog into a single checkpoint. Call when all peers have converged.
What Silk does NOT provide (yet):
- Byzantine fault tolerance — a malicious peer with network access can spoof clocks within drift bounds to win LWW conflicts. Signatures + trust policies will mitigate this.
If you're syncing between devices you control, Silk is safe. If you're building an open network where anonymous peers connect, enable strict mode and register trusted authors.
See SECURITY.md for the full threat model.
Schema Philosophy: Open Properties (D-026)
Silk's ontology defines the minimum, not the maximum. You declare node types, edge types, required properties, and type constraints. Silk enforces those. But your application can store any additional properties without changing the ontology.
=
=
# Required property "name" is enforced
# OK
# Unknown properties are accepted and stored as-is
=
assert == 30 # stored and queryable
assert ==
# Unknown subtypes are also accepted
assert ==
What stays enforced:
- Node types must be declared in the ontology
- Edge types must be declared (with source/target type constraints)
- Required properties must be present
- Known property types are validated (if
nameisstring, it must be a string)
What's open:
- Extra properties on any node or edge (stored without type validation)
- Unknown subtypes (type-level required properties still enforced)
This means your application can evolve its data model without touching the ontology or recreating the store. Add fields, add subtypes, store metadata — Silk doesn't block you.
Architecture
For the full architectural overview — research foundations (Merkle-CRDTs, Delta-state CRDTs, MAPE-K), design principles, and 28 design decisions (D-001–D-028) plus 8 roadmap items (R-01–R-08) — see DESIGN.md.
Write (add_node, add_edge, update_property)
│
▼
Entry { hash(BLAKE3), op, clock(HLC), author, parents }
│
▼
OpLog (append-only Merkle-DAG, content-addressed)
│
├──► MaterializedGraph (live view: nodes, edges, properties)
│ └── Query: get_node, query_by_type, outgoing_edges, bfs, shortest_path
│
└──► Sync Protocol
├── generate_sync_offer() → Bloom filter of known hashes
├── receive_sync_offer() → Entries the peer is missing
└── merge_sync_payload() → Apply remote entries, re-materialize
Convergence guarantee: Two stores that have exchanged sync messages in both directions will have identical materialized graphs. This is a mathematical property of the Merkle-CRDT construction, not an implementation detail.
Benchmarks
Measured on Apple M4 Max (16 cores, 128 GB RAM), macOS 15.7, Rust 1.94.0, release build. Run cargo bench --no-default-features on your hardware. For the full analysis — what these numbers mean and why they matter — see WHY.md.
Core Operations
| Operation | Time | Throughput |
|---|---|---|
| Entry create (AddNode) | 449 ns | 2.2M ops/sec |
| Entry serialize (MessagePack) | 289 ns | 3.5M ops/sec |
| Entry deserialize | 957 ns | 1.0M ops/sec |
| BLAKE3 hash verify | 247 ns | 4.0M ops/sec |
Graph Write + Materialize
| Operation | 100 nodes | 1,000 nodes | 10,000 nodes |
|---|---|---|---|
| Add nodes (write + materialize) | 129 µs | 1.5 ms | 16.8 ms |
| Rebuild graph from entries | 20 µs | 278 µs | 2.7 ms |
Graph Algorithms
| Algorithm | 1,000 nodes | 10,000 nodes |
|---|---|---|
| BFS traversal | 564 ns | 580 ns |
| Shortest path | 706 ns | 717 ns |
| Impact analysis (reverse BFS) | 108 ns | 105 ns |
| Pattern match (2-type chain) | 555 µs | 8.1 ms |
Edge Density (1,000 nodes, varying edge count)
| Algorithm | 1K edges | 10K edges | 50K edges |
|---|---|---|---|
| BFS | 248 ns | 2.5 µs | 12.3 µs |
| Shortest path | 264 ns | 3.2 µs | 14.8 µs |
Edge density scales linearly with traversal cost — no surprise, but now measured.
Sync Protocol
| Scenario | Time |
|---|---|
| Sync offer (100 nodes) | 24 µs |
| Sync offer (1,000 nodes) | 282 µs |
| Sync offer (10,000 nodes) | 3.3 ms |
| Full transfer (100 nodes, zero overlap) | 111 µs |
| Full transfer (1,000 nodes, zero overlap) | 1.3 ms |
| Incremental sync (900/1000 shared, 10% delta) | 611 µs |
| Partition heal (500 divergent writes per side) | 833 µs |
Python Examples (sync scenarios)
| Scenario | Nodes | Sync time |
|---|---|---|
| Two offline peers converge | 2 x 500 | 5.1 ms |
| Three-peer partition heal | 3 x 200 | 6.6 ms |
| Concurrent property writes | 1 node | 0.06 ms |
| 10-peer ring convergence | 10 x 100 | 51.8 ms (3 rounds) |
Sync by Divergence (1,000 nodes per peer, bidirectional)
| Overlap | Time |
|---|---|
| 1% (nearly disjoint) | 2.3 ms |
| 10% | 8.0 ms |
| 50% | 27.7 ms |
| 90% (nearly converged) | 42.4 ms |
Higher overlap = more Bloom filter cross-checking. The fast path is low-overlap (first sync). Incremental syncs on already-converged peers use the 10% delta path (611 µs, see above).
Run the examples yourself: python examples/offline_first.py. See all eight scenarios in examples/.
Design Decisions
Silk's architecture is driven by 28 design decisions (D-001–D-028) plus 8 roadmap items (R-01–R-08), documented in full in DESIGN.md. Key choices:
| Decision | Choice | Why |
|---|---|---|
| Hash function | BLAKE3 | Fastest cryptographic hash, 128-bit security |
| Serialization | MessagePack | Compact binary, faster than JSON, schema-free |
| Storage | redb | Embedded, transactional, pure Rust, no C dependencies |
| Clock | Hybrid Logical (R-01) | Wall-clock time + logical counter. Real-time LWW ordering. |
| Conflict resolution | Per-property LWW | Non-conflicting concurrent writes both win |
| Sync | Delta-state + Bloom | Minimize transfer: only send what the peer lacks |
| Schema | Open properties (D-026) | Ontology is the floor, not the ceiling — unknown properties accepted |
| Sync validation | Quarantine (R-02) | Invalid entries in oplog but hidden from graph |
| Schema evolution | Monotonic (R-03) | Add types/properties only, never remove |
| Convergence | Formal proof (R-04) | Three theorems proving determinism, idempotence, convergence |
| Peer selection | Gossip (R-05) | Logarithmic fan-out: ceil(ln(N)+1) per round, scales to 10K+ peers |
| Time-travel | as_of() replay (R-06) | Query graph state at any historical time — Datomic-inspired |
| Compaction | Epoch checkpoints (R-08) | Compress oplog to single entry — bounds growth for production |
Python API Reference
GraphStore
# Construction
= # new store
= # existing store
# Mutations
# Queries
# dict | None
# dict | None
# list[dict]
# list[dict]
# list[dict]
# list[dict]
# list[dict]
# list[dict]
# Graph algorithms
# Sync
= # bytes
= # bytes
= # int (entries merged)
= # bytes (full state)
# Subscriptions
= # callback(event_dict)
# Signing (D-027)
= # generate keypair, returns hex public key
# load existing key
= # hex public key or None
# trust a peer
# reject unsigned entries on merge
# Quarantine (R-02)
= # list of hex hashes of quarantined entries
# Gossip Peer Selection (R-05)
# add a peer
# remove a peer
= # [{"peer_id", "address", "last_seen_ms"}]
= # ceil(ln(N)+1) random peer IDs
# mark sync completed
Time-Travel (R-06)
# Time-Travel (R-06)
= # read-only GraphSnapshot
# GraphSnapshot has: get_node, all_nodes, all_edges, query_nodes_by_type,
# outgoing_edges, incoming_edges, bfs, shortest_path, impact_analysis,
# pattern_match, topological_sort, has_cycle, neighbors, subgraph
Query Builder (R-07)
# Fluent queries — chain filters and traversals
=
# Other result methods
= # just node IDs
= # count
= # first or None
# Works on historical snapshots too
=
# Extension point: plug in custom query engines
# Parse and evaluate custom query language
return
Compaction (R-08)
# Compaction (R-08)
# compress oplog → checkpoint, returns hex hash
= # inspect checkpoint without compacting
ObservationLog
=
Return Value Reference
Methods like get_node() and get_edge() return plain dicts. Here's what's inside:
Node (get_node(), all_nodes(), query_nodes_by_*):
Edge (get_edge(), all_edges(), outgoing_edges(), incoming_edges()):
Subscription callback event (subscribe(callback)):
Error Handling
Silk uses Python's built-in exception types. Error messages are descriptive but must be matched as strings in v0.1 (custom exception classes planned for v0.2).
# Validation error — bad schema, unknown type, missing required property
# "unknown node type 'spaceship'"
# I/O error — can't create or open store file
=
# Sync error — corrupted or incompatible payload
| Exception | When |
|---|---|
ValueError |
Invalid ontology, unknown node/edge type, missing required property, bad sync payload, invalid hash |
IOError |
Store file can't be created/opened, redb I/O failure |
RuntimeError |
Corrupted store (no genesis), snapshot with no entries |
Persistence
In-Memory vs Persistent
| Mode | Constructor | Durability | Use case |
|---|---|---|---|
| In-memory | GraphStore("id", ontology) |
Lost on process exit | Tests, ephemeral processing, short-lived computations |
| Persistent | GraphStore("id", ontology, path="store.redb") |
Durable (redb ACID) | Production, anything that must survive restarts |
Crash Recovery
Persistent stores use redb, which provides ACID transactions. Each write (add_node, add_edge, update_property) is committed to disk in its own transaction before the method returns.
If the process crashes:
- Completed writes are durable — they survived the crash
- In-flight writes are rolled back by redb's transaction recovery on next open
- No manual recovery needed —
GraphStore.open(path)replays the entry log and rebuilds the materialized graph
# Persistent store — survives crashes
=
# At this point, n1 is on disk. Kill -9 the process — it's safe.
# Reopen after crash
=
assert is not None # still there
Scalability
Silk keeps the full graph in memory (OpLog + MaterializedGraph). Practical limits depend on available RAM.
| Graph size | Memory (approx) | Write throughput | Full sync |
|---|---|---|---|
| 1K nodes | ~5 MB | 670K nodes/sec | 1.3 ms |
| 10K nodes | ~50 MB | 595K nodes/sec | ~13 ms |
| 100K nodes | ~500 MB | ~500K nodes/sec | ~130 ms (est.) |
Query Performance at Scale
| Method | Complexity | Safe at 100K+ |
|---|---|---|
get_node(id) |
O(1) hash lookup | Yes |
get_edge(id) |
O(1) hash lookup | Yes |
query_nodes_by_type(t) |
O(n) type index scan | Yes |
outgoing_edges(id) |
O(degree) | Yes |
bfs(start) |
O(reachable subgraph) | Yes — visits only what's connected |
shortest_path(a, b) |
O(reachable subgraph) | Yes |
all_nodes() |
O(n) — loads all into Python list | Avoid for large graphs |
pattern_match(types) |
O(n * branching^depth), capped at max_results | Yes — default limit 1000 |
Recommendations
- < 100K nodes: Silk handles this comfortably on modern hardware (< 500 MB)
- 100K–1M nodes: Works but monitor memory. Prefer targeted queries (
get_node,query_nodes_by_type) overall_nodes() - > 1M nodes: Consider sharding across multiple stores with application-level routing
Silk is designed for knowledge graphs (thousands to hundreds of thousands of richly-connected entities), not for big-data workloads (millions of rows with simple schemas). If you need the latter, use DuckDB or ClickHouse.
Peer Scaling
For fleets with many peers, use gossip-based sync (R-05):
store.register_peer()to register known peersstore.select_sync_targets()each tick — returns ceil(ln(N)+1) targets- 10 peers → 4 targets/tick, 1000 → 8, 10000 → 10
- Full convergence in O(log N) rounds
Bounded Growth via Compaction
For long-running systems, use store.compact() (R-08) to bound oplog size:
- Compresses entire history into a single checkpoint entry
- All live nodes, edges, and ontology extensions preserved
- Tombstoned entities excluded (clean slate)
- Oplog goes from N entries to 1
Safety: only call compact() when all known peers have synced to current state. A compacted store can bootstrap new peers via snapshot().
Tutorial: Build a Distributed Note-Taking App
A complete walkthrough showing how to use Silk for a real project — a note-taking app where notes sync across devices without a server.
1. Define the Schema
=
2. Create a Store (Persistent)
# Each device gets its own store, backed by a local file
=
3. Add Data
# Create a notebook
# Add notes
# Organize: notebook contains notes
# Tag notes
4. Query the Graph
# Get all notes in a notebook
=
=
=
# Find notes by tag — traverse TAGGED edges
=
=
# Use BFS to find all nodes connected to a notebook within 2 hops
=
5. Sync Between Devices
# Phone creates its own store
=
# Phone adds a note while offline
# Later, when connected — sync both ways
=
=
=
=
# Both devices now have all 3 notes
assert ==
6. Handle Conflicts
# Both devices edit the same note at the same time
# Also, laptop adds a tag (non-conflicting change)
# Sync — per-property LWW resolves the conflict
# "body" goes to whichever write happened later (wall-clock time)
# "priority" is non-conflicting — preserved on both sides
7. Subscribe to Changes
=
# ... any write or merge triggers the callback
This pattern — schema, local store, sync, conflict resolution — works for any domain: task managers, CRMs, inventory systems, collaborative editors, IoT dashboards.
Building from Source
# Rust tests (without Python bindings)
# Python development build
# Python tests
# Benchmarks
Documentation
| Document | What it covers |
|---|---|
| README.md | Quick start, features, API reference, tutorial |
| WHY.md | Why Silk exists, what makes it different, benchmark analysis |
| DESIGN.md | Research foundations, 28 design decisions (D-001–D-028) plus 8 roadmap items (R-01–R-08), architecture |
| PROOF.md | Convergence proof — three theorems, six invariants, quarantine + ontology addenda |
| ROADMAP.md | Eight problems in order — dependency graph and implementation status |
| PROTOCOL.md | Sync wire format specification — for implementing peers in other languages |
| CHANGELOG.md | Release history |
| SECURITY.md | Threat model, known limitations, vulnerability reporting |
| QUERY_EXTENSIONS.md | How to extend the query model — QueryEngine protocol, examples, rationale |
| CONTRIBUTING.md | Development setup, PR guidelines |
examples/ |
Runnable Python scenarios (offline sync, partition heal, conflicts, ring topology, signing, time-travel, queries, compaction) |
License
Licensed under the Functional Source License, Version 1.0, Apache 2.0 Change License (FSL-1.0-Apache-2.0).
What this means:
- Free to use, modify, and distribute for any purpose that doesn't compete with silk-graph
- After 2 years from each release, the code converts to Apache License 2.0 (fully permissive)
- Internal use, learning, research, and non-competing commercial use are unrestricted
See LICENSE.md for full terms.