daa-prime-dht 0.2.1

Kademlia-based distributed hash table implementation for Prime distributed ML framework
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
# DAA Prime DHT

[![Crates.io](https://img.shields.io/crates/v/daa-prime-dht.svg)](https://crates.io/crates/daa-prime-dht)
[![Documentation](https://docs.rs/daa-prime-dht/badge.svg)](https://docs.rs/daa-prime-dht)
[![License](https://img.shields.io/crates/l/daa-prime-dht.svg)](https://github.com/yourusername/daa/blob/main/LICENSE)

High-performance Kademlia-based distributed hash table implementation for the Prime distributed machine learning framework. Provides decentralized storage and discovery for ML models, gradients, and metadata.

## Overview

DAA Prime DHT implements a robust distributed hash table based on the Kademlia protocol, specifically optimized for machine learning workloads. It provides:

- **Distributed Storage**: Scalable key-value storage across network nodes
- **Efficient Routing**: Kademlia routing table for O(log N) lookups  
- **Fault Tolerance**: Built-in replication and failure recovery
- **ML Optimization**: Specialized for storing ML models and gradients
- **libp2p Integration**: Seamless networking with modern P2P stack

## Features

- 🔄 **Kademlia Protocol**: Industry-standard DHT with proven scalability
- 🚀 **High Performance**: Optimized for ML data patterns and sizes
- 🛡️ **Fault Tolerant**: Configurable replication and self-healing
- 🔍 **Efficient Discovery**: Fast peer and content discovery
- 📊 **Property Testing**: Comprehensive test coverage with QuickCheck/PropTest
- 🌐 **Network Agnostic**: Works with any libp2p transport

## Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
daa-prime-dht = "0.2.1"
daa-prime-core = "0.2.1"
libp2p = "0.53"
```

## Quick Start

### Basic DHT Operations

```rust
use daa_prime_dht::{Dht, DhtConfig};
use libp2p::PeerId;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create DHT with default configuration
    let peer_id = PeerId::random();
    let config = DhtConfig::default();
    let dht = Dht::new(peer_id, config);
    
    // Store a value
    let key = b"model_weights_v1".to_vec();
    let value = b"serialized_model_data".to_vec();
    dht.put(key.clone(), value.clone()).await?;
    
    // Retrieve the value
    if let Some(retrieved) = dht.get(key).await? {
        println!("Retrieved: {:?}", String::from_utf8_lossy(&retrieved));
    }
    
    Ok(())
}
```

### Custom Configuration

```rust
use daa_prime_dht::DhtConfig;
use std::time::Duration;

let config = DhtConfig {
    k_bucket_size: 20,           // Nodes per bucket
    alpha: 3,                    // Parallel query parameter
    replication_factor: 3,       // Number of replicas
    refresh_interval: Duration::from_secs(3600), // Bucket refresh
    ttl: Duration::from_secs(86400),             // Data expiration
};
```

## Core Concepts

### Kademlia Routing

The DHT uses Kademlia's XOR metric for routing:

```rust
use daa_prime_dht::routing::RoutingTable;
use libp2p::PeerId;

let peer_id = PeerId::random();
let routing_table = RoutingTable::new(peer_id, 20);

// Find closest nodes to a target
let target = PeerId::random();
let closest_nodes = routing_table.find_closest(&target, 3);
```

### Storage Management

```rust
use daa_prime_dht::storage::Storage;

let mut storage = Storage::new();

// Store with automatic expiration
storage.put(b"key".to_vec(), b"value".to_vec());

// Retrieve if still valid
if let Some(value) = storage.get(&b"key".to_vec()) {
    println!("Found: {:?}", value);
}
```

### Peer Discovery

```rust
use daa_prime_dht::discovery::Discovery;

let discovery = Discovery::new();

// Start discovery process
discovery.start_discovery().await?;

// Get discovered peers
let peers = discovery.get_peers().await;
```

## Advanced Usage

### ML Model Storage

```rust
use daa_prime_dht::Dht;
use daa_prime_core::ModelMetadata;

// Store ML model with metadata
async fn store_model(
    dht: &Dht,
    model_id: &str,
    model_data: Vec<u8>,
    metadata: ModelMetadata,
) -> Result<(), Box<dyn std::error::Error>> {
    // Store model data
    let model_key = format!("model:{}", model_id).into_bytes();
    dht.put(model_key, model_data).await?;
    
    // Store metadata separately
    let meta_key = format!("meta:{}", model_id).into_bytes();
    let meta_data = serde_json::to_vec(&metadata)?;
    dht.put(meta_key, meta_data).await?;
    
    Ok(())
}

// Retrieve model with metadata
async fn load_model(
    dht: &Dht,
    model_id: &str,
) -> Result<Option<(Vec<u8>, ModelMetadata)>, Box<dyn std::error::Error>> {
    let model_key = format!("model:{}", model_id).into_bytes();
    let meta_key = format!("meta:{}", model_id).into_bytes();
    
    let model_data = dht.get(model_key).await?;
    let meta_data = dht.get(meta_key).await?;
    
    match (model_data, meta_data) {
        (Some(model), Some(meta)) => {
            let metadata: ModelMetadata = serde_json::from_slice(&meta)?;
            Ok(Some((model, metadata)))
        },
        _ => Ok(None),
    }
}
```

### Gradient Aggregation

```rust
use daa_prime_dht::Dht;
use daa_prime_core::{GradientUpdate, NodeId};

// Store gradient updates from multiple nodes
async fn store_gradients(
    dht: &Dht,
    round: u64,
    updates: Vec<GradientUpdate>,
) -> Result<(), Box<dyn std::error::Error>> {
    for update in updates {
        let key = format!("gradient:{}:{}", round, update.node_id.0).into_bytes();
        let data = serde_json::to_vec(&update)?;
        dht.put(key, data).await?;
    }
    Ok(())
}

// Retrieve all gradients for a training round
async fn collect_gradients(
    dht: &Dht,
    round: u64,
    node_ids: Vec<NodeId>,
) -> Result<Vec<GradientUpdate>, Box<dyn std::error::Error>> {
    let mut gradients = Vec::new();
    
    for node_id in node_ids {
        let key = format!("gradient:{}:{}", round, node_id.0).into_bytes();
        if let Some(data) = dht.get(key).await? {
            let update: GradientUpdate = serde_json::from_slice(&data)?;
            gradients.push(update);
        }
    }
    
    Ok(gradients)
}
```

### Custom Storage Backends

```rust
use daa_prime_dht::storage::{Storage, StorageBackend};
use async_trait::async_trait;

// Implement custom storage backend
struct RedisStorage {
    client: redis::Client,
}

#[async_trait]
impl StorageBackend for RedisStorage {
    async fn put(&self, key: Vec<u8>, value: Vec<u8>) -> Result<(), Error> {
        // Store in Redis
        let mut conn = self.client.get_async_connection().await?;
        redis::cmd("SET")
            .arg(key)
            .arg(value)
            .query_async(&mut conn)
            .await?;
        Ok(())
    }
    
    async fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, Error> {
        let mut conn = self.client.get_async_connection().await?;
        let result: Option<Vec<u8>> = redis::cmd("GET")
            .arg(key)
            .query_async(&mut conn)
            .await?;
        Ok(result)
    }
}
```

## Configuration Options

### DHT Parameters

```rust
use daa_prime_dht::DhtConfig;
use std::time::Duration;

let config = DhtConfig {
    // Routing table parameters
    k_bucket_size: 20,           // Maximum nodes per k-bucket
    alpha: 3,                    // Concurrency parameter for lookups
    
    // Storage parameters  
    replication_factor: 3,       // Number of nodes to store each value
    ttl: Duration::from_secs(86400), // Time-to-live for stored values
    
    // Maintenance parameters
    refresh_interval: Duration::from_secs(3600), // Bucket refresh frequency
};
```

### Performance Tuning

```rust
// High-performance configuration for large networks
let high_perf_config = DhtConfig {
    k_bucket_size: 32,           // Larger buckets for better connectivity
    alpha: 8,                    // Higher concurrency for faster lookups
    replication_factor: 5,       // More replicas for reliability
    refresh_interval: Duration::from_secs(1800), // More frequent refresh
    ttl: Duration::from_secs(172800), // Longer TTL for ML models
};

// Low-latency configuration for small networks
let low_latency_config = DhtConfig {
    k_bucket_size: 10,           // Smaller buckets for faster updates
    alpha: 2,                    // Lower concurrency to reduce overhead
    replication_factor: 2,       // Fewer replicas for speed
    refresh_interval: Duration::from_secs(7200), // Less frequent refresh
    ttl: Duration::from_secs(43200), // Shorter TTL for rapid iteration
};
```

## Testing

The crate includes comprehensive property-based testing:

```rust
use daa_prime_dht::{Dht, DhtConfig};
use proptest::prelude::*;
use quickcheck_macros::quickcheck;

proptest! {
    #[test]
    fn test_dht_config_validation(
        k_bucket in 5usize..100usize,
        alpha in 1usize..10usize,
        replication in 1usize..20usize,
    ) {
        let config = DhtConfig {
            k_bucket_size: k_bucket,
            alpha,
            replication_factor: replication,
            refresh_interval: Duration::from_secs(3600),
            ttl: Duration::from_secs(86400),
        };
        
        // Validate configuration constraints
        assert!(config.alpha <= config.k_bucket_size);
        assert!(config.replication_factor >= 1);
    }
}

#[quickcheck]
async fn test_put_get_consistency(key: Vec<u8>, value: Vec<u8>) -> bool {
    if key.is_empty() || value.is_empty() {
        return true;
    }
    
    let peer_id = PeerId::random();
    let dht = Dht::new(peer_id, DhtConfig::default());
    
    // Put should succeed and get should return same value
    dht.put(key.clone(), value.clone()).await.is_ok() &&
    dht.get(key).await.unwrap() == Some(value)
}
```

## Performance Benchmarks

```rust
#[cfg(test)]
mod benchmarks {
    use criterion::{black_box, criterion_group, criterion_main, Criterion};
    use daa_prime_dht::{Dht, DhtConfig};
    
    fn bench_put_operations(c: &mut Criterion) {
        let rt = tokio::runtime::Runtime::new().unwrap();
        let peer_id = PeerId::random();
        let dht = Dht::new(peer_id, DhtConfig::default());
        
        c.bench_function("dht_put_1kb", |b| {
            b.to_async(&rt).iter(|| async {
                let key = black_box(format!("key_{}", fastrand::u64(..)).into_bytes());
                let value = black_box(vec![0u8; 1024]);
                dht.put(key, value).await.unwrap();
            });
        });
    }
    
    criterion_group!(benches, bench_put_operations);
    criterion_main!(benches);
}
```

## Monitoring and Metrics

```rust
use daa_prime_dht::Dht;

// Monitor DHT health
async fn monitor_dht_health(dht: &Dht) {
    let routing_table = dht.routing_table.read().await;
    let storage = dht.storage.read().await;
    
    println!("DHT Health Report:");
    println!("  Active peers: {}", routing_table.len());
    println!("  Stored keys: {}", storage.len());
    println!("  Bucket distribution: {:?}", routing_table.bucket_sizes());
}
```

## Integration Examples

### With DAA Orchestrator

```rust
use daa_prime_dht::Dht;
// use daa_orchestrator::Orchestrator;

async fn setup_orchestrated_dht() -> Result<(), Box<dyn std::error::Error>> {
    let peer_id = PeerId::random();
    let dht = Dht::new(peer_id, DhtConfig::default());
    
    // Register DHT with orchestrator
    // let orchestrator = Orchestrator::new().await?;
    // orchestrator.register_dht(dht).await?;
    
    Ok(())
}
```

### With Prime Trainer

```rust
use daa_prime_dht::Dht;
use daa_prime_trainer::TrainerNode;

async fn trainer_with_dht() -> Result<(), Box<dyn std::error::Error>> {
    let peer_id = PeerId::random();
    let dht = Dht::new(peer_id, DhtConfig::default());
    
    // Trainer can use DHT for model sharing
    let trainer = TrainerNode::new("trainer-001".to_string()).await?;
    
    // Store training checkpoints in DHT
    let checkpoint_key = b"checkpoint_epoch_10".to_vec();
    let checkpoint_data = b"model_state_dict".to_vec();
    dht.put(checkpoint_key, checkpoint_data).await?;
    
    Ok(())
}
```

## Troubleshooting

### Common Issues

1. **Connection Failures**
   ```rust
   // Ensure proper libp2p transport configuration
   use libp2p::{tcp, noise, yamux, Transport};
   
   let transport = tcp::tokio::Transport::default()
       .upgrade(noise::Config::new(&keypair)?)
       .multiplex(yamux::Config::default())
       .boxed();
   ```

2. **High Memory Usage**
   ```rust
   // Implement periodic cleanup
   async fn cleanup_expired_data(dht: &Dht) {
       let mut storage = dht.storage.write().await;
       storage.cleanup_expired();
   }
   ```

3. **Slow Lookups**
   ```rust
   // Increase alpha parameter for higher concurrency
   let config = DhtConfig {
       alpha: 8, // Higher concurrency
       ..Default::default()
   };
   ```

## Roadmap

- [ ] IPFS compatibility layer
- [ ] Advanced replication strategies
- [ ] Built-in data compression
- [ ] Metrics and observability
- [ ] Byzantine fault tolerance
- [ ] Integration with DAA consensus

## Contributing

Contributions are welcome! Please see our [Contributing Guide](../../CONTRIBUTING.md) for details.

## License

This project is licensed under the MIT License - see the [LICENSE](../../LICENSE) file for details.

## Related Crates

- [`daa-prime-core`]../prime-core: Core types and protocol definitions
- [`daa-prime-trainer`]../prime-trainer: Training node implementation
- [`daa-prime-coordinator`]../prime-coordinator: Coordination and governance  
- [`daa-prime-cli`]../prime-cli: Command-line interface

## References

- [Kademlia Paper]https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf - Original Kademlia protocol
- [libp2p Specification]https://github.com/libp2p/specs - Modern P2P networking
- [DHT Security]https://docs.libp2p.io/concepts/security/ - Security considerations

## Support

- 📖 [Documentation]https://docs.rs/daa-prime-dht  
- 🐛 [Issue Tracker]https://github.com/yourusername/daa/issues
- 💬 [Discussions]https://github.com/yourusername/daa/discussions