v_queue 0.3.0 - Docs.rs

# Performance Guide

Performance characteristics and tuning guide for V-Queue.

## Performance Characteristics

### Write Performance

V-Queue is optimized for high-throughput sequential writes.

**Typical Performance**:
- **Sequential Writes**: 50,000 - 200,000 messages/second
- **Throughput**: 50 - 500 MB/second (depends on message size)
- **Latency**: < 1ms per message (excluding fsync)

**Factors**:
- Storage device speed (SSD >> HDD)
- Message size (smaller = more messages/sec)
- fsync policy (durability vs throughput trade-off)
- CPU and memory availability

### Read Performance

Sequential reading is also highly optimized.

**Typical Performance**:
- **Sequential Reads**: 100,000 - 500,000 messages/second
- **Throughput**: 100 - 1000 MB/second
- **Latency**: < 1ms per message

**Factors**:
- Storage device speed
- Batch size (larger batches = better throughput)
- Number of concurrent consumers
- Message size

### Storage Overhead

**Per Message Overhead**: 25 bytes (header)

**Example**:
- 100-byte message = 125 bytes on disk (25 byte header + 100 byte body)
- Overhead: 25%

- 1KB message = 1049 bytes on disk
- Overhead: 2.4%

**Recommendation**: Larger messages have lower relative overhead.

## Benchmarking

### Write Benchmark

Simple write benchmark:

```rust
use v_queue::queue::{Queue, Mode};
use v_queue::record::MsgType;
use std::time::Instant;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut queue = Queue::new("./bench-queues", "bench-queue", Mode::ReadWrite)?;
    
    let message = b"Hello World - test message for benchmarking";
    let count = 100_000;
    
    let start = Instant::now();
    
    for _ in 0..count {
        queue.push(message, MsgType::String)?;
    }
    
    let duration = start.elapsed();
    let msg_per_sec = count as f64 / duration.as_secs_f64();
    let mb_per_sec = (count * message.len()) as f64 / duration.as_secs_f64() / 1_048_576.0;
    
    println!("Wrote {} messages in {:?}", count, duration);
    println!("Throughput: {:.2} msg/sec", msg_per_sec);
    println!("Throughput: {:.2} MB/sec", mb_per_sec);
    
    Ok(())
}
```

**Expected output** (NVMe SSD):
```
Wrote 100000 messages in 1.234s
Throughput: 81037.28 msg/sec
Throughput: 3.42 MB/sec
```

### HTTP API Benchmark

Using `wrk` tool:

```bash
# Install wrk
sudo apt install wrk

# Benchmark consume endpoint
wrk -t4 -c100 -d30s \
  --header "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" \
  "http://localhost:9093/api/v1/queues/events/consumers/bench/messages?timeout=0&max_messages=100"
```

### Apache Bench

```bash
# Install ab
sudo apt install apache2-utils

# Benchmark
ab -n 10000 -c 10 \
  -H "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" \
  "http://localhost:9093/api/v1/queues/events/consumers/bench/messages?timeout=0"
```

## Optimization Strategies

### 1. Storage Selection

**NVMe SSD** (Best):
- Highest performance
- Low latency
- Suitable for high-throughput workloads

**SATA SSD**:
- Good performance
- Affordable
- Suitable for most workloads

**HDD** (Not Recommended):
- Poor random I/O performance
- Acceptable only for low-throughput workloads
- Consider for archival/backup

**Recommendation**: Use SSD for data directory.

### 2. File System Selection

**ext4** (Recommended):
- Good balance of performance and reliability
- Mature and stable
- Good fsync performance

**XFS**:
- Excellent for large files
- Good for high-concurrency workloads

**btrfs**:
- Advanced features (snapshots, compression)
- May have fsync performance issues

**Recommendation**: Use ext4 or XFS.

### 3. Mount Options

Optimize mount options for performance:

```bash
# /etc/fstab
/dev/nvme0n1 /var/lib/vqueue ext4 noatime,nodiratime,data=writeback 0 2
```

**Options**:
- `noatime` - Don't update access time (reduces writes)
- `nodiratime` - Don't update directory access time
- `data=writeback` - Faster but less safe (use with caution)

**For production** (safer):
```bash
/dev/nvme0n1 /var/lib/vqueue ext4 noatime,nodiratime,data=ordered 0 2
```

### 4. Batch Size Tuning

Larger batches improve throughput but increase latency.

**Small Batches** (`max_messages=10`):
- Low latency
- More frequent commits
- More HTTP requests

**Large Batches** (`max_messages=1000`):
- High throughput
- Fewer HTTP requests
- Higher memory usage

**Recommendation**: 
- Real-time processing: 10-50
- Batch processing: 100-1000
- Bulk processing: 1000-5000

### 5. Timeout Tuning

Balance between latency and efficiency.

**Short Timeout** (`timeout_ms=0-5000`):
- Lower latency
- More "empty" responses
- Higher CPU usage

**Long Timeout** (`timeout_ms=30000-60000`):
- Efficient (waits for messages)
- Higher latency
- Lower CPU usage

**Recommendation**:
- Real-time: 5-10 seconds
- Background processing: 30 seconds
- Batch jobs: 0 (poll immediately)

### 6. Message Size Optimization

**Small Messages** (< 100 bytes):
- Higher overhead (25 byte header)
- More messages/second
- Less throughput (MB/s)

**Large Messages** (> 1KB):
- Lower overhead percentage
- Fewer messages/second
- Higher throughput (MB/s)

**Recommendation**: 
- Batch small events into larger messages if possible
- Use compression for large messages
- Target 1KB - 10KB per message for best balance

### 7. Concurrent Consumers

Multiple consumers can read in parallel.

**Benefits**:
- Parallel processing
- Higher total throughput
- Better resource utilization

**Considerations**:
- Each consumer tracks own offset
- No coordination between consumers
- All consumers see all messages

**Example** - 3 consumers:

```python
# Consumer 1
messages = consumer.consume("events", "consumer-1")

# Consumer 2 (different consumer, same queue)
messages = consumer.consume("events", "consumer-2")

# Consumer 3
messages = consumer.consume("events", "consumer-3")
```

Each processes all messages independently.

### 8. Disable Authentication (Development Only)

For maximum performance in development:

```bash
v-queue-server --no-auth
```

Removes authentication overhead (~5-10% performance gain).

**Warning**: Only use in isolated development environments.

### 9. Logging Level

Reduce logging overhead in production:

```toml
log_level = "warn"  # or "error"
```

Debug logging can significantly impact performance.

### 10. Resource Limits

Increase OS limits for high-throughput scenarios:

```bash
# /etc/security/limits.conf
vqueue soft nofile 65536
vqueue hard nofile 65536

# /etc/sysctl.conf
fs.file-max = 2097152
net.core.somaxconn = 1024
```

Apply:
```bash
sudo sysctl -p
```

## Monitoring Performance

### System Metrics

Monitor these metrics:

**Disk I/O**:
```bash
iostat -x 1
```

Look for:
- `%util` - Disk utilization (< 80% ideal)
- `await` - Average wait time (< 10ms for SSD)
- `r/s`, `w/s` - Reads/writes per second

**CPU Usage**:
```bash
top -p $(pgrep v-queue-server)
```

**Memory Usage**:
```bash
ps aux | grep v-queue-server
```

**Network**:
```bash
iftop
```

### Application Metrics

Track in your application:

```python
import time

# Measure consume latency
start = time.time()
messages = consumer.consume("events", "my-consumer", timeout=30)
consume_latency = time.time() - start

# Measure processing time
start = time.time()
for msg in messages:
    process_message(msg)
process_time = time.time() - start

# Measure commit latency
start = time.time()
consumer.commit("events", "my-consumer")
commit_latency = time.time() - start

print(f"Consume: {consume_latency:.3f}s")
print(f"Process: {process_time:.3f}s")
print(f"Commit: {commit_latency:.3f}s")
print(f"Total: {consume_latency + process_time + commit_latency:.3f}s")
print(f"Throughput: {len(messages)/(consume_latency + process_time + commit_latency):.2f} msg/s")
```

### Queue Growth Monitoring

Monitor queue directory size:

```bash
# Watch queue directory size
watch -n 5 'du -sh /var/lib/vqueue/queues/*'

# Track specific queue growth
watch -n 1 'ls -lh /var/lib/vqueue/queues/events-0/events_queue'
```

### Prometheus Metrics (Future Enhancement)

Currently not implemented, but consider adding:

- Messages produced/consumed per second
- Consumer lag (messages behind)
- Request latency histogram
- Error rates
- Queue sizes

## Performance Issues and Solutions

### Issue: Slow Writes

**Symptoms**: Low messages/second write rate

**Causes**:
- Slow disk (HDD)
- Filesystem issues
- Resource contention

**Solutions**:
- Upgrade to SSD
- Check `iostat` for disk bottleneck
- Optimize filesystem mount options
- Reduce fsync frequency (if acceptable)

### Issue: Slow Reads

**Symptoms**: High latency on consume requests

**Causes**:
- Small batch sizes
- Disk I/O bottleneck
- Many concurrent consumers

**Solutions**:
- Increase `max_messages` parameter
- Use SSD storage
- Optimize batch processing
- Reduce consumer count if not needed

### Issue: High CPU Usage

**Symptoms**: v-queue-server using excessive CPU

**Causes**:
- Many concurrent requests
- Small timeouts causing polling
- Debug logging enabled

**Solutions**:
- Reduce log level to `warn` or `error`
- Increase timeout values
- Scale horizontally (multiple queues)

### Issue: High Memory Usage

**Symptoms**: Increasing memory consumption

**Causes**:
- Large batch sizes
- Memory leak (file handles)
- Many consumers

**Solutions**:
- Reduce `max_messages`
- Check file descriptor count: `lsof -p $(pgrep v-queue-server) | wc -l`
- Monitor with `valgrind` (development)

### Issue: Queue Growing Unbounded

**Symptoms**: Disk usage continuously increasing

**Causes**:
- Consumers not keeping up with producers
- Consumers not committing
- Dead/stalled consumers

**Solutions**:
- Add more consumers
- Increase batch sizes
- Implement queue retention policy
- Monitor consumer lag

## Capacity Planning

### Estimating Storage Requirements

```
Daily Storage = messages/day × avg_message_size × (1 + overhead)

overhead = 25 bytes / avg_message_size
```

**Example**:
- 10 million messages/day
- 500 bytes average message size
- Overhead: 25/500 = 5%

```
Daily Storage = 10,000,000 × 500 × 1.05
             = 5.25 GB/day
             ≈ 158 GB/month
```

### Estimating Throughput Requirements

```
Peak throughput = peak_messages/second × avg_message_size
```

**Example**:
- 1,000 messages/second peak
- 1KB average message

```
Peak throughput = 1,000 × 1KB = 1 MB/s
```

This is easily handled by any SSD.

### Sizing Recommendations

**Small Deployment** (< 1M messages/day):
- CPU: 2 cores
- RAM: 2 GB
- Disk: 100 GB SSD
- Network: 100 Mbps

**Medium Deployment** (1M - 100M messages/day):
- CPU: 4 cores
- RAM: 4 GB
- Disk: 500 GB SSD
- Network: 1 Gbps

**Large Deployment** (> 100M messages/day):
- CPU: 8+ cores
- RAM: 8+ GB
- Disk: 1+ TB NVMe SSD
- Network: 10 Gbps

## Best Practices Summary

1. **Use SSD storage** for best performance
2. **Tune batch sizes** for your workload (100-1000 typical)
3. **Use appropriate timeouts** (30 seconds for background processing)
4. **Monitor disk I/O** as primary bottleneck
5. **Reduce logging** in production (`warn` or `error` level)
6. **Disable authentication** only in development
7. **Plan storage capacity** based on message volume
8. **Use ext4 or XFS** filesystem
9. **Optimize mount options** (`noatime`, `nodiratime`)
10. **Monitor queue growth** to prevent disk exhaustion

## Next Steps

- [Configuration Guide](04-configuration.md)
- [API Reference](05-api-reference.md)
- [Troubleshooting](09-troubleshooting.md)