# Datasphere P2P Sharing Spec
**Status:** Draft v3
**Author:** Planning session 2026-01-07
## Overview
Enable Datasphere instances to share knowledge graphs peer-to-peer over the internet, creating a "superlocal mesh" of shared context without central servers.
### Goals
1. **Decentralized** - No central server, direct peer connections with relay fallback
2. **Internet-native** - Works across NAT, firewalls, different networks
3. **Privacy-first** - Project-level sharing control, private by default
4. **Simple** - Publisher/subscriber model, one-way sync, no conflicts
5. **Isolated** - Each peer's data in separate DB, source always clear
### Non-Goals (v1)
- Real-time streaming
- Fine-grained (node-level) sharing control
- Bi-directional conflict resolution
- Update/deletion propagation (subscribers get snapshots)
- Access control beyond peer lists
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Datasphere Instance │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Local Store (all nodes) │ │
│ │ ┌─────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Private projects │ │ Shared projects │ │ │
│ │ │ (never synced) │ │ (filtered for publish) │─┼──┼──▶ Peers
│ │ └─────────────────────┘ └─────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Peer Stores (read-only snapshots) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Peer A │ │ Peer B │ │ Peer C │ │ │
│ │ │ nodes.lance│ │ nodes.lance│ │ nodes.lance│ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Query Layer │ │
│ │ (fan-out) │ │
│ └─────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────┤
│ Transport (iroh-net) │
│ QUIC + NAT traversal + relay fallback │
└─────────────────────────────────────────────────────────────────┘
```
## Storage Layout
```
~/.datasphere/
├── db/ # Local database (all nodes)
│ ├── nodes.lance/
│ └── processed.lance/
├── peers/ # Peer replicas (read-only)
│ ├── <node_id_1>/
│ │ ├── nodes.lance/
│ │ └── meta.json
│ └── <node_id_2>/
│ ├── nodes.lance/
│ └── meta.json
├── identity.key # Ed25519 keypair (iroh NodeId)
├── share.json # Which projects to publish
├── peers.json # Peer configuration
└── queue.jsonl # Existing job queue
```
## Configuration Files
### identity.key
Generated by iroh-net. Contains the node's cryptographic identity. The **NodeId** (public key) is the stable address used to connect to this instance from anywhere on the internet.
### share.json
Controls which projects are published to peers. Everything else is private.
```json
{
"shared_projects": [
"/Users/drazen/playground/ai-omnibus",
"/Users/drazen/oss/my-library"
]
}
```
**Behavior:**
- Paths are canonicalized on add (symlinks resolved, absolute paths)
- Nodes are matched by `source` field prefix
- Only nodes from shared projects are sent to peers
- Private project nodes never leave the machine
- Default: empty list (share nothing)
**Privacy note:** The `source` field (containing project paths) is visible to peers. This reveals your filesystem structure for shared projects. If this is a concern, use a dedicated directory structure for shareable projects.
### peers.json
```json
{
"node_id": "abc123...",
"peers": {
"def456...": {
"name": "alice",
"subscribe": true,
"publish": true,
"last_sync": "2026-01-07T10:30:00Z"
},
"ghi789...": {
"name": "bob",
"subscribe": true,
"publish": false,
"last_sync": null
}
}
}
```
**Fields:**
- `node_id`: This instance's iroh NodeId
- `subscribe`: Pull their shared nodes into `peers/<id>/`
- `publish`: Allow them to pull our shared nodes
### peers/<node_id>/meta.json
```json
{
"node_id": "def456...",
"name": "alice",
"last_sync": "2026-01-07T10:30:00Z",
"node_count": 1523,
"sync_cursor": "2026-01-07T10:25:00Z"
}
```
## Transport: iroh-net
[iroh-net](https://iroh.computer/) provides:
- **QUIC connections** - Fast, multiplexed, encrypted
- **NAT traversal** - Hole punching for direct connections
- **Relay fallback** - When direct connection fails, traffic routes through relay
- **Stable identity** - NodeId works regardless of IP changes
### Why iroh-net
| Internet-wide connectivity | Built-in NAT traversal + relay |
| No central server | Relay is optional fallback, not required |
| Stable addressing | NodeId (public key) is the address |
| Rust-native | First-class Rust support |
| Encrypted | QUIC provides TLS 1.3 |
### Connection Flow
```
Subscriber iroh relay Publisher
│ │ │
│── connect(NodeId) ────────▶│ │
│ │◀── register ─────────────────│
│ │ │
│◀─────────── hole punch / relay ─────────────────────────▶│
│ │ │
│◀═══════════ QUIC connection (direct or relayed) ════════▶│
```
### Relay Considerations
The iroh project provides a public relay network. For production/team use, consider:
- Self-hosted relay for reliability and privacy
- See [iroh relay docs](https://docs.rs/iroh-net) for setup
- Graceful degradation: warn users when relay-only (no direct connection)
## Sync Protocol
### Overview
Pull-based sync over QUIC streams. Subscriber requests nodes newer than their cursor. Publisher filters by shared projects and streams matching nodes.
### Sync Semantics
**Snapshot model:** Subscribers receive a point-in-time snapshot of the publisher's shared nodes.
- **New nodes**: Synced via incremental cursor
- **Updated nodes**: NOT propagated (subscriber keeps original)
- **Deleted nodes**: NOT propagated (subscriber keeps copy)
- **Full re-sync**: Reset cursor to `null` to re-fetch everything
This is intentional simplicity for v1. Subscribers get what was shared at sync time. For fresh data, re-sync.
### Message Format
Length-prefixed MessagePack over QUIC stream. Each message:
```
┌──────────────┬─────────────────────────────┐
│ Length (4B) │ MessagePack payload │
│ big-endian │ │
└──────────────┴─────────────────────────────┘
```
### Messages
**SYNC_REQUEST**
```rust
SyncRequest {
cursor: Option<DateTime<Utc>>, // None = full sync
}
```
**SYNC_RESPONSE**
```rust
SyncResponse {
nodes: Vec<Node>,
new_cursor: DateTime<Utc>,
has_more: bool,
}
```
**ERROR**
```rust
Error {
code: ErrorCode, // NotAuthorized, RateLimited, etc.
message: String,
}
```
### Sync Flow
1. Subscriber opens QUIC stream to publisher's NodeId
2. iroh-net handles NAT traversal / relay
3. Subscriber sends SYNC_REQUEST with cursor
4. Publisher queries: `SELECT * FROM nodes WHERE source LIKE '<shared_project>%' AND timestamp > cursor ORDER BY timestamp`
5. Publisher streams SYNC_RESPONSE (paginated, batch size ~100 nodes)
6. Subscriber writes nodes to `peers/<node_id>/nodes.lance` (upsert by `id`)
7. Subscriber updates cursor in `meta.json`
### Node Deduplication
Subscriber performs upsert by `Node.id`:
- If node with same `id` exists, skip (keep existing - snapshot semantics)
- If new `id`, insert
This handles re-syncs and cursor resets gracefully.
### Authorization
Publisher checks if requester's NodeId is in `peers.json` with `publish: true`. Otherwise returns error:
```
Error {
code: NotAuthorized,
message: "Peer {node_id} not authorized. Use: ds peer add {node_id} --publish"
}
```
## Query Layer
### Multi-DB Fan-out
```rust
pub async fn query_all(query: &str, limit: usize) -> Vec<QueryResult> {
let mut results = Vec::new();
// Query local
let local = local_store.query(query, limit).await?;
for (node, score) in local {
results.push(QueryResult {
node,
score,
source: Source::Local,
});
}
// Query each peer
for (peer_id, peer_store) in peer_stores.iter() {
let peer_results = peer_store.query(query, limit).await?;
for (node, score) in peer_results {
results.push(QueryResult {
node,
score,
source: Source::Peer(peer_id.clone()),
});
}
}
// Sort by score, take top N
results.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap());
results.truncate(limit);
results
}
```
### CLI Output
```bash
$ ds query "rust error handling"
[local] (0.89) How to use thiserror for custom errors...
[alice] (0.85) Error handling patterns in async Rust...
[bob] (0.82) Result vs Option - when to use which...
```
### MCP Output
```json
{
"results": [
{
"id": "uuid...",
"content": "How to use thiserror...",
"score": 0.89,
"source": "local",
"project": "/Users/drazen/playground/ai-omnibus"
},
{
"id": "uuid...",
"content": "Error handling patterns...",
"score": 0.85,
"source": "alice",
"project": "/home/alice/rust-patterns"
}
]
}
```
## CLI Commands
### Identity
```bash
ds peer init # Generate identity, initialize config
ds peer id # Print this node's ID (for sharing)
```
### Share Configuration
```bash
ds peer share <project_path> # Add project to shared list (canonicalizes path)
ds peer unshare <project_path> # Remove project from shared list
ds peer shared # List shared projects
```
### Peer Management
```bash
ds peer add <node_id> [--name <name>]
# Add peer, default subscribe=true, publish=false
ds peer add <node_id> --publish
# Add peer and allow them to pull from us
ds peer remove <node_id> # Remove peer, delete their local replica
ds peer list # Show all peers with sync status
```
### Sync
```bash
ds peer sync # Sync with all subscribed peers
ds peer sync <node_id> # Sync with specific peer
ds peer sync --full # Full re-sync (reset all cursors)
```
### Integration with Daemon
The existing `ds start` daemon is extended:
- Accepts incoming sync connections (if any peers have `publish: true`)
- Optionally syncs on startup: `ds start --sync`
- Optionally syncs periodically: `ds start --sync-interval 30m`
- Uses file locking to prevent concurrent writes (daemon holds lock, CLI waits)
## Concurrency
To prevent conflicts between daemon and CLI:
1. **Write lock on peer stores**: Only one process writes to `peers/` at a time
2. **Lock file**: `~/.datasphere/peers/.lock`
3. **Behavior**:
- `ds start` acquires lock on startup, releases on shutdown
- `ds peer sync` acquires lock, syncs, releases
- If lock held, wait with timeout (30s) then fail with message
## Build Configuration
P2P support adds significant dependencies (iroh-net, quinn, etc.). Available as optional feature:
```bash
# Default build (no P2P)
cargo build --release
# With P2P support
cargo build --release --features p2p
```
Commands `ds peer *` require the `p2p` feature. Without it, they print:
```
P2P features not enabled. Rebuild with: cargo build --features p2p
```
## Example Session
```bash
# Alice's machine
$ ds peer init
Generated NodeId: alice123...
$ ds peer share ~/oss/rust-patterns
Added to shared projects: /home/alice/oss/rust-patterns
$ ds start
Daemon started. Accepting peer connections.
# Bob's machine
$ ds peer init
Generated NodeId: bob456...
$ ds peer add alice123... --name alice
Added peer 'alice'
$ ds peer add alice123... --publish
Updated: alice can now pull from us
# Bob shares a project back
$ ds peer share ~/code/error-handling-guide
Added to shared projects: /home/bob/code/error-handling-guide
$ ds peer sync
Syncing with alice... 847 nodes received.
$ ds query "thiserror derive"
[alice] (0.92) Using #[derive(Error)] with thiserror...
[local] (0.78) My notes on error handling...
# Alice syncs back (after Bob enabled publish)
$ ds peer add bob456... --name bob
$ ds peer sync
Syncing with bob... 234 nodes received.
```
## Implementation Phases
### Phase 1: Foundation
- [ ] Add iroh-net as optional dependency (`features = ["p2p"]`)
- [ ] Implement identity generation (`ds peer init`)
- [ ] Create config files (share.json, peers.json)
- [ ] CLI: `ds peer id`, `ds peer share/unshare/shared`
- [ ] Path canonicalization for share.json
### Phase 2: Multi-DB Query
- [ ] Refactor Store to support peer stores in `peers/<id>/`
- [ ] Implement fan-out query across local + peers
- [ ] Update CLI output to show source
- [ ] Update MCP response to include source
- [ ] `ds peer import <path>` for manual sneakernet testing
- [ ] Per-peer stats in `ds stats`
### Phase 3: Sync Protocol
- [ ] Implement sync server (accept connections in daemon)
- [ ] Implement sync client (`ds peer sync`)
- [ ] Project filtering at publish time (prefix match on `source`)
- [ ] Cursor-based incremental sync with upsert
- [ ] Error handling for offline/unreachable peers
- [ ] `--full` flag for cursor reset
### Phase 4: Peer Management
- [ ] `ds peer add/remove/list` commands
- [ ] Authorization (check peers.json on connect)
- [ ] Helpful error messages for auth failures
- [ ] File locking for concurrent access
### Phase 5: Polish
- [ ] `--sync` and `--sync-interval` flags for daemon
- [ ] Progress indicators for long syncs
- [ ] Retry logic for failed syncs
- [ ] Relay status warnings (direct vs relayed)
- [ ] Documentation
## Decisions Log
| Snapshot semantics (no update/delete propagation) | Simplicity for v1. Full re-sync available if needed. |
| Project-level sharing (not node-level) | Right granularity. What would node-level even mean? |
| iroh-net transport | NAT traversal + relay, Rust-native, maintained |
| Feature flag for P2P | Keeps base binary small, opt-in complexity |
| Upsert by node ID | Handles re-syncs gracefully, keeps first version |
| Paths visible to peers | Accepted tradeoff. Use dedicated share dirs if sensitive. |
## Open Questions
1. **Stale peer data** - If you unsubscribe from a peer, delete their `peers/<id>/` immediately or keep as archive? *Lean: delete immediately, clean separation.*
2. **NodeId exchange** - How do users share NodeIds in practice? Copy-paste is fine for v1.
3. **iroh-net version** - Target latest stable (check API compatibility before implementation).
## Security Considerations
- **Encrypted in transit** - QUIC provides TLS 1.3
- **Identity-based auth** - Only configured peers can connect
- **Private by default** - Must explicitly share projects
- **Path exposure** - Shared project paths visible to peers
- **No data at rest encryption** - LanceDB files are unencrypted (same as current)
## Dependencies
| `iroh-net` | QUIC transport, NAT traversal, relay | `p2p` |
| `rmp-serde` | MessagePack serialization | `p2p` |
| `fs2` | File locking | `p2p` |
| (existing) | LanceDB, chrono, uuid, serde | core |
## References
- [iroh-net docs](https://docs.rs/iroh-net)
- [iroh examples](https://github.com/n0-computer/iroh/tree/main/iroh-net/examples)
- [Syncthing BEP](https://docs.syncthing.net/specs/bep-v1.html) - inspiration for sync model