shardex 0.1.0

A high-performance memory-mapped vector search engine with ACID transactions and incremental updates
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
# Shardex Performance Tuning Guide

This guide provides comprehensive information on optimizing Shardex performance for different workloads and environments.

## Performance Fundamentals

Shardex performance depends on several key factors:

1. **Memory Usage**: How efficiently memory is used for vectors and metadata
2. **Disk I/O**: How quickly data can be written to and read from disk
3. **Vector Operations**: How efficiently similarity calculations are performed
4. **Index Structure**: How well the shard organization supports your query patterns

## Configuration Parameters

### Vector Dimensions

The number of dimensions directly affects memory usage and computation time:

```rust
// Higher dimensions = more memory and computation
let config = ShardexConfig::new()
    .vector_size(128)   // ~512 bytes per vector
    .vector_size(384)   // ~1536 bytes per vector  
    .vector_size(768)   // ~3072 bytes per vector
    .vector_size(1536); // ~6144 bytes per vector
```

**Recommendations:**
- Use the smallest dimension that maintains acceptable accuracy
- Consider dimensionality reduction techniques (PCA, t-SNE) for large vectors
- Monitor memory usage as dimensions increase

### Shard Size Optimization

Shard size affects both memory usage and search performance:

```rust
let config = ShardexConfig::new()
    .shard_size(1000)    // Small shards: less memory per search
    .shard_size(10000)   // Default: balanced performance
    .shard_size(50000)   // Large shards: fewer splits, more memory
    .shard_size(100000); // Very large: maximum throughput
```

**Trade-offs:**
- **Small shards (1K-5K vectors)**:
  - ✅ Lower memory per search operation
  - ✅ Faster individual shard searches
  - ❌ More shards to manage (higher overhead)
  - ❌ More frequent splits

- **Large shards (50K-100K vectors)**:
  - ✅ Fewer shards to manage
  - ✅ Better batch insertion performance
  - ❌ Higher memory per search
  - ❌ Slower individual shard searches

### Batch Write Configuration

Optimize write performance through batching:

```rust
let config = ShardexConfig::new()
    .batch_write_interval_ms(10)   // Very responsive (higher CPU)
    .batch_write_interval_ms(100)  // Default: balanced
    .batch_write_interval_ms(500); // High throughput (higher latency)
```

**Guidelines:**
- **10-50ms**: Interactive applications requiring low latency
- **100-200ms**: General purpose applications  
- **500-1000ms**: Batch processing with high throughput requirements

### Slop Factor Tuning

Balance search speed vs. accuracy:

```rust
// During search
let results = index.search(&query, k, Some(1)).await?;  // Fastest
let results = index.search(&query, k, Some(3)).await?;  // Balanced  
let results = index.search(&query, k, Some(10)).await?; // Most accurate
```

**Performance Impact:**
- **Slop 1**: Searches only the closest shard (fastest, lowest accuracy)
- **Slop 3**: Searches 3 closest shards (balanced)
- **Slop 5+**: Searches many shards (slower, higher accuracy)

### Bloom Filter Optimization

Configure bloom filters for deletion performance:

```rust
let config = ShardexConfig::new()
    .bloom_filter_size(512)   // Memory optimized
    .bloom_filter_size(1024)  // Default  
    .bloom_filter_size(4096); // Deletion optimized
```

**Recommendations:**
- **Small datasets (<10K docs)**: 512-1024 bits
- **Medium datasets (10K-100K docs)**: 1024-2048 bits
- **Large datasets (>100K docs)**: 2048-4096 bits
- **Heavy deletion workloads**: 4096+ bits

## Workload-Specific Optimizations

### High-Throughput Indexing

For maximum indexing performance:

```rust
let config = ShardexConfig::new()
    .vector_size(384)                    // Match your embedding model
    .shard_size(50000)                   // Large shards
    .shardex_segment_size(5000)          // More centroids per segment
    .wal_segment_size(4 * 1024 * 1024)  // 4MB WAL segments
    .batch_write_interval_ms(250)        // Longer batching window
    .bloom_filter_size(2048);            // Adequate for large datasets

// Batch your inserts
let batch_size = 5000;
let postings = generate_postings(batch_size);
index.add_postings(postings).await?;
```

### Low-Latency Search

For responsive search applications:

```rust
let config = ShardexConfig::new()
    .vector_size(256)                    // Smaller vectors if possible
    .shard_size(10000)                   // Smaller shards
    .shardex_segment_size(1000)          // Faster shard selection
    .batch_write_interval_ms(50)         // Responsive writes
    .default_slop_factor(2);             // Narrow search

// Search with low slop factor
let results = index.search(&query, 10, Some(1)).await?;
```

### Memory-Constrained Environments

For systems with limited memory:

```rust
let config = ShardexConfig::new()
    .vector_size(128)                    // Smallest acceptable dimension
    .shard_size(5000)                    // Small shards
    .shardex_segment_size(500)           // Fewer centroids in memory
    .wal_segment_size(256 * 1024)        // 256KB WAL segments
    .batch_write_interval_ms(200)        // Less frequent batching
    .bloom_filter_size(512);             // Minimal bloom filters
```

### Large-Scale Datasets

For million+ document datasets:

```rust
let config = ShardexConfig::new()
    .vector_size(768)                    // Full-precision vectors
    .shard_size(100000)                  // Very large shards
    .shardex_segment_size(10000)         // Many centroids per segment
    .wal_segment_size(8 * 1024 * 1024)  // 8MB WAL segments
    .batch_write_interval_ms(500)        // High-throughput batching
    .default_slop_factor(5)              // Accurate search
    .bloom_filter_size(4096);            // Large bloom filters
```

## Performance Monitoring

### Key Metrics to Track

```rust
// Get comprehensive statistics
let stats = index.detailed_stats().await?;

// Monitor these key metrics:
println!("Throughput: {:.0} ops/sec", stats.write_throughput);
println!("Search P95: {:?}", stats.search_latency_p95);
println!("Memory: {:.1}MB", stats.memory_usage as f64 / 1024.0 / 1024.0);
println!("Shard utilization: {:.1}%", stats.average_shard_utilization * 100.0);
println!("Bloom hit rate: {:.1}%", stats.bloom_filter_hit_rate * 100.0);
```

### Performance Alerts

Set up monitoring for these conditions:

```rust
// Check for performance issues
let stats = index.detailed_stats().await?;

// High memory usage
if stats.memory_usage > 2 * 1024 * 1024 * 1024 { // 2GB
    println!("⚠ High memory usage detected");
}

// Poor shard utilization  
if stats.average_shard_utilization < 0.3 {
    println!("⚠ Low shard utilization - consider smaller shard_size");
}

// Too many shards
if stats.total_shards > 1000 {
    println!("⚠ Many shards detected - consider larger shard_size");
}

// High deletion ratio
let deletion_ratio = stats.deleted_postings as f64 / stats.total_postings as f64;
if deletion_ratio > 0.4 {
    println!("⚠ High deletion ratio - consider index rebuild");
}
```

## Benchmarking Your Setup

### Create a Performance Test

```rust
use std::time::Instant;
use shardex::{Shardex, ShardexConfig, ShardexImpl, Posting, DocumentId};

async fn benchmark_indexing(
    config: ShardexConfig,
    num_docs: usize,
    batch_size: usize,
) -> Result<(), Box<dyn std::error::Error>> {
    let mut index = ShardexImpl::create(config).await?;
    
    let start_time = Instant::now();
    let mut total_docs = 0;
    
    while total_docs < num_docs {
        let current_batch = (num_docs - total_docs).min(batch_size);
        let postings = generate_random_postings(current_batch, 384);
        
        index.add_postings(postings).await?;
        index.flush().await?;
        
        total_docs += current_batch;
        
        // Progress reporting
        if total_docs % (batch_size * 10) == 0 {
            let elapsed = start_time.elapsed();
            let rate = total_docs as f64 / elapsed.as_secs_f64();
            println!("Indexed {} docs, rate: {:.0} docs/sec", total_docs, rate);
        }
    }
    
    let total_time = start_time.elapsed();
    let final_rate = num_docs as f64 / total_time.as_secs_f64();
    
    println!("Final indexing rate: {:.0} docs/sec", final_rate);
    
    Ok(())
}

async fn benchmark_search(
    index: &ShardexImpl,
    num_queries: usize,
    k: usize,
) -> Result<(), Box<dyn std::error::Error>> {
    let query_vector = generate_random_vector(384);
    let mut search_times = Vec::new();
    
    // Warmup
    for _ in 0..10 {
        index.search(&query_vector, k, None).await?;
    }
    
    // Actual benchmark
    for _ in 0..num_queries {
        let start = Instant::now();
        let _results = index.search(&query_vector, k, None).await?;
        search_times.push(start.elapsed());
    }
    
    // Calculate statistics
    search_times.sort();
    let p50 = search_times[search_times.len() / 2];
    let p95 = search_times[search_times.len() * 95 / 100];
    let p99 = search_times[search_times.len() * 99 / 100];
    
    println!("Search performance (k={}):", k);
    println!("  P50: {:.2}ms", p50.as_secs_f64() * 1000.0);
    println!("  P95: {:.2}ms", p95.as_secs_f64() * 1000.0);
    println!("  P99: {:.2}ms", p99.as_secs_f64() * 1000.0);
    
    Ok(())
}
```

## Hardware Considerations

### CPU Optimization

- **SIMD Support**: Ensure your CPU supports AVX2 or AVX-512 for faster vector operations
- **Multi-core**: Shardex uses Rayon for parallel operations - more cores help
- **Cache Size**: Larger L3 cache improves performance for large shard searches

### Memory Recommendations

Calculate memory requirements:

```
Estimated Memory = (num_vectors × vector_dimension × 4 bytes) × 1.5
```

Example for 1M vectors at 384 dimensions:
```
Memory = 1,000,000 × 384 × 4 bytes × 1.5 = ~2.3GB
```

### Storage Optimization

#### SSD vs HDD
- **SSD Recommended**: 10-100x faster for random I/O operations
- **NVMe Best**: Lowest latency for WAL operations
- **HDD Acceptable**: For read-heavy workloads with sufficient RAM

#### Disk Space Planning
```
Disk Usage = (vector storage) + (posting storage) + (index overhead) + (WAL)
           ≈ (num_vectors × vector_dimension × 4) × 1.8
```

## Common Performance Issues

### Problem: Slow Searches

**Symptoms**: High search latency, timeout errors
**Solutions**:
1. Reduce slop factor
2. Use smaller shard sizes
3. Reduce vector dimensions if possible
4. Add more RAM for better caching

### Problem: High Memory Usage

**Symptoms**: Out of memory errors, system swapping
**Solutions**:
1. Reduce shard size
2. Use smaller vector dimensions
3. Implement batch processing with smaller batches
4. Increase system RAM

### Problem: Slow Indexing

**Symptoms**: Low throughput, high CPU usage during indexing
**Solutions**:
1. Increase batch size
2. Increase batch write interval
3. Use larger shards
4. Optimize vector generation pipeline

### Problem: Frequent Shard Splits

**Symptoms**: Many small shards, degraded performance over time
**Solutions**:
1. Increase shard size
2. Pre-allocate capacity if dataset size is known
3. Monitor shard utilization metrics

## Production Deployment Tips

### Resource Allocation
```bash
# Recommended minimum resources
CPU: 4 cores (8+ preferred)
RAM: 4GB + (dataset_size × 1.5)
Storage: SSD with 100GB+ free space
```

### Operating System Tuning
```bash
# Increase memory map limits
echo 'vm.max_map_count = 262144' >> /etc/sysctl.conf

# Optimize for memory-mapped files
echo 'vm.swappiness = 1' >> /etc/sysctl.conf

# Increase file descriptor limits
echo '* soft nofile 65536' >> /etc/security/limits.conf
echo '* hard nofile 65536' >> /etc/security/limits.conf
```

### Monitoring in Production
```rust
// Set up regular monitoring
tokio::spawn(async move {
    loop {
        let stats = index.stats().await?;
        
        // Log key metrics
        log::info!("Index stats: {} docs, {:.1}MB, {} shards", 
            stats.total_postings,
            stats.memory_usage as f64 / 1024.0 / 1024.0,
            stats.total_shards
        );
        
        tokio::time::sleep(Duration::from_secs(60)).await;
    }
});
```

## Performance Comparison

### Typical Performance Ranges

| Configuration | Indexing Rate | Search P95 | Memory/1M docs |
|---------------|---------------|------------|-----------------|
| Memory-optimized | 5K docs/sec | 15ms | 1.2GB |
| Balanced | 8K docs/sec | 8ms | 1.8GB |
| Speed-optimized | 15K docs/sec | 3ms | 2.5GB |
| Large-scale | 12K docs/sec | 5ms | 2.0GB |

### Scaling Characteristics

- **Linear scaling**: Performance scales roughly linearly with data size up to memory limits
- **Search performance**: Degrades logarithmically with dataset size
- **Memory usage**: Scales linearly with dataset size and vector dimensions

## Advanced Optimization Techniques

### Custom Distance Metrics
For specialized use cases, implement custom distance functions:

```rust
use shardex::DistanceMetric;

// Use specialized distance metric for your data
let results = index.search_with_metric(
    &query_vector, 
    10, 
    DistanceMetric::Euclidean,  // or Cosine, DotProduct
    Some(3)
).await?;
```

### Batch Operations
Optimize for bulk operations:

```rust
// Process in optimal batch sizes
const OPTIMAL_BATCH_SIZE: usize = 5000;

for chunk in documents.chunks(OPTIMAL_BATCH_SIZE) {
    let postings: Vec<_> = chunk.iter()
        .map(|doc| create_posting(doc))
        .collect();
    
    index.add_postings(postings).await?;
    
    // Flush periodically, not after every batch
    if chunk_count % 10 == 0 {
        index.flush().await?;
    }
}
```

### Monitoring and Alerting
Set up comprehensive monitoring:

```rust
// Custom performance monitor
struct PerformanceMonitor {
    search_times: VecDeque<Duration>,
    index_times: VecDeque<Duration>,
    last_stats: Option<DetailedIndexStats>,
}

impl PerformanceMonitor {
    fn check_performance(&mut self, current_stats: &DetailedIndexStats) {
        // Detect performance regressions
        if let Some(ref last) = self.last_stats {
            if current_stats.search_latency_p95 > last.search_latency_p95 * 1.5 {
                log::warn!("Search performance degradation detected");
            }
        }
        
        self.last_stats = Some(current_stats.clone());
    }
}
```