grapevine 1.0.0

# Grapevine Protocol Specification

## Overview

Grapevine implements a **push-based epidemic gossip protocol** for reliable message dissemination in distributed peer-to-peer networks. The protocol combines two complementary mechanisms:

1. **Epidemic Broadcast** - Fast, probabilistic message propagation
2. **Anti-Entropy** - Periodic synchronization for eventual consistency

This hybrid approach provides:

- **High throughput** - Messages spread rapidly through the network
- **Reliability** - Anti-entropy ensures no messages are permanently lost
- **Scalability** - Probabilistic forwarding prevents message storms
- **Fault tolerance** - No single point of failure, tolerates node failures

## Protocol Guarantees

### Eventual Consistency

All nodes eventually receive all messages, even in the presence of:

- Message loss due to probabilistic forwarding
- Temporary network partitions
- Node failures and restarts
- Race conditions in message arrival

### Deduplication

Messages are delivered exactly once per node through:

- Unique message identifiers (origin + sequence + timestamp)
- Per-message deduplication cache with configurable TTL
- Forward count tracking to prevent re-forwarding loops

### Bounded Message Propagation

Messages do not propagate indefinitely through:

- TTL mechanism (default: 10 hops)
- Per-node forward count limits (default: max 5 forwards)
- Time-based message eviction (default: 5 minutes)

### DoS Protection

The protocol resists denial-of-service attacks via:

- Per-peer token bucket rate limiting (100 capacity, 50 tokens/sec)
- Maximum message size limits (default: 1MB)
- Automatic peer health tracking and disconnection

## Protocol Phases

### 1. Bootstrap Phase

When a node starts:

1. Connects to configured bootstrap peers
2. Sends `PeerListRequest` to each bootstrap peer
3. Receives `PeerListResponse` with known peer addresses
4. Connects to discovered peers until reaching `max_peers` limit
5. Begins participating in gossip once connected to at least one peer

### 2. Active Gossip Phase

During normal operation:

1. **Message Broadcast**: Application broadcasts messages via epidemic protocol
2. **Heartbeats**: Nodes send periodic heartbeats to maintain peer health
3. **Anti-Entropy**: Periodic digest exchange ensures consistency
4. **Peer Maintenance**: Automatic health monitoring and peer replacement

### 3. Shutdown Phase

When a node shuts down gracefully:

1. Sends `Goodbye` messages to all connected peers
2. Stops accepting new messages
3. Waits for in-flight messages to complete (timeout: 500ms)
4. Closes all peer connections
5. Cleans up resources

## Message Types (Payload Variants)

```rust
   /// User-defined application data
   Application(Bytes),

   /// Peer discovery request
   PeerDiscovery,

   /// Peer list announcement
   PeerAnnouncement { peers: Vec<SocketAddr> },

   /// Heartbeat/keep-alive
   Heartbeat { from: SocketAddr },

   /// Request for peer list
   PeerListRequest,

   /// Response with peer list
   PeerListResponse { peers: Vec<SocketAddr> },

   /// Anti-entropy digest (message IDs this node knows about)
   AntiEntropyDigest { message_ids: Vec<MessageId> },

   /// Request for specific messages by ID
   MessageRequest { ids: Vec<MessageId> },

   /// Response containing requested messages
   MessageResponse { messages: Vec<Message> },

   /// Graceful shutdown notification
   Goodbye { reason: String },

   /// Direct message to a specific peer (not gossiped)
   DirectMessage { recipient: SocketAddr, data: Bytes },
```

## Wire Format

Messages use length-prefixed framing:

```txt
┌────────────┬──────────────────────┐
│ Length (4) │  Bincode Payload     │
│   bytes    │    (Length bytes)    │
└────────────┴──────────────────────┘
```

Length is big-endian `u32`.

## Message Deduplication

Each message has a unique ID:

```rust
struct MessageId {
    origin: SocketAddr,
    sequence: u64,
    timestamp: u64,
}
```

Nodes track seen messages in a `DashMap<MessageId, MessageEntry>` where `MessageEntry` contains:

- `timestamp`: When the message was first seen
- `forward_count`: How many times this node has forwarded the message
- `ttl`: Current TTL value

Messages are evicted after 5 minutes (configurable via `message_dedup_ttl`).

## TTL Mechanism

- Default TTL: 10
- Decremented on each hop
- Message not forwarded when TTL = 0
- Prevents infinite loops

## Epidemic Broadcast

1. Node creates message with TTL (default: 10)
2. Sends to `fanout` random peers (default: 3)
3. Receiving peers:
   - Check if already seen (deduplication)
   - Store in `seen_messages` cache
   - Make probabilistic forwarding decision (default: 70% probability)
   - Check forward count limit (default: max 5 forwards)
4. If forwarding:
   - Decrement TTL
   - Increment forward count
   - Re-gossip to random peers
5. Process repeats until TTL = 0 or max forwards reached

Configuration:

- `epidemic.forward_probability`: Probability of forwarding (default: 0.7)
- `epidemic.max_forwards`: Maximum forwards per node (default: 5)

## Anti-Entropy

Implemented to ensure eventual consistency:

1. Periodically (default: every 30s) exchange message digests with peers
2. Each node sends `AntiEntropyDigest` containing all known `MessageId`s
3. Receiving peer compares digest with local `seen_messages`
4. If missing messages detected:
   - Send `MessageRequest` with missing `MessageId`s
   - Peer responds with `MessageResponse` containing full messages
5. Requested messages are processed normally (deduplicated, forwarded if needed)

Configuration:

- `anti_entropy.enabled`: Enable/disable anti-entropy (default: true)
- `anti_entropy.interval`: Sync interval (default: 30s)
- `anti_entropy.fanout`: Number of peers to sync with per round (default: 3)

This mechanism ensures that even if epidemic broadcast misses some nodes due to probabilistic forwarding, all nodes eventually receive all messages.

## Peer Health and Lifecycle

### Peer State Machine

Each peer connection transitions through states:

```txt
Connecting → Connected → Stale → Disconnected
     ↓            ↓         ↓
     └────────────┴─────────┴─→ (reconnect or replace)
```

- **Connecting**: Initial connection establishment
- **Connected**: Active, responding peer
- **Stale**: No messages received within `peer_timeout` (default: 30s)
- **Disconnected**: Connection lost or peer failed

### Health Scoring

Peer health is calculated from:

- **Success/failure ratio** - Recent message delivery success rate
- **Connection age** - Longer connections receive bonus score
- **Consecutive failures** - Penalty increases exponentially

Peers with 5 consecutive failures are automatically disconnected.

### Peer Maintenance

Background task runs every 10 seconds:

1. Check `last_seen` timestamp for all peers
2. Mark stale peers (no activity for `peer_timeout`)
3. Disconnect unhealthy peers (health score below threshold)
4. Attempt reconnection or discover new peers if below `max_peers`

### Peer Selection

When selecting peers for gossip:

- Random selection from connected peers (uniform distribution)
- No preference for high-health peers (ensures network coverage)
- Fanout ensures message reaches diverse subset of network

## Security Considerations

### Rate Limiting

Per-peer token bucket algorithm:

- **Capacity**: 100 tokens (burst allowance)
- **Refill rate**: 50 tokens/second (sustained rate)
- **Cost**: 1 token per message
- Peers exceeding rate are throttled (not disconnected)

Rate limiting is configurable via `rate_limit` config:

```rust
RateLimitConfig {
    enabled: true,
    capacity: 100,
    refill_rate: 50,
}
```

### Message Size Limits

Configurable maximum message size (default: 1MB):

- Prevents memory exhaustion attacks
- Enforced at codec layer before deserialization
- Messages exceeding limit are dropped with error log