# Performance Guide
Performance characteristics and tuning guide for V-Queue.
## Performance Characteristics
### Write Performance
V-Queue is optimized for high-throughput sequential writes.
**Typical Performance**:
- **Sequential Writes**: 50,000 - 200,000 messages/second
- **Throughput**: 50 - 500 MB/second (depends on message size)
- **Latency**: < 1ms per message (excluding fsync)
**Factors**:
- Storage device speed (SSD >> HDD)
- Message size (smaller = more messages/sec)
- fsync policy (durability vs throughput trade-off)
- CPU and memory availability
### Read Performance
Sequential reading is also highly optimized.
**Typical Performance**:
- **Sequential Reads**: 100,000 - 500,000 messages/second
- **Throughput**: 100 - 1000 MB/second
- **Latency**: < 1ms per message
**Factors**:
- Storage device speed
- Batch size (larger batches = better throughput)
- Number of concurrent consumers
- Message size
### Storage Overhead
**Per Message Overhead**: 25 bytes (header)
**Example**:
- 100-byte message = 125 bytes on disk (25 byte header + 100 byte body)
- Overhead: 25%
- 1KB message = 1049 bytes on disk
- Overhead: 2.4%
**Recommendation**: Larger messages have lower relative overhead.
## Benchmarking
### Write Benchmark
Simple write benchmark:
```rust
use v_queue::queue::{Queue, Mode};
use v_queue::record::MsgType;
use std::time::Instant;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut queue = Queue::new("./bench-queues", "bench-queue", Mode::ReadWrite)?;
let message = b"Hello World - test message for benchmarking";
let count = 100_000;
let start = Instant::now();
for _ in 0..count {
queue.push(message, MsgType::String)?;
}
let duration = start.elapsed();
let msg_per_sec = count as f64 / duration.as_secs_f64();
let mb_per_sec = (count * message.len()) as f64 / duration.as_secs_f64() / 1_048_576.0;
println!("Wrote {} messages in {:?}", count, duration);
println!("Throughput: {:.2} msg/sec", msg_per_sec);
println!("Throughput: {:.2} MB/sec", mb_per_sec);
Ok(())
}
```
**Expected output** (NVMe SSD):
```
Wrote 100000 messages in 1.234s
Throughput: 81037.28 msg/sec
Throughput: 3.42 MB/sec
```
### HTTP API Benchmark
Using `wrk` tool:
```bash
# Install wrk
sudo apt install wrk
# Benchmark consume endpoint
wrk -t4 -c100 -d30s \
--header "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" \
"http://localhost:9093/api/v1/queues/events/consumers/bench/messages?timeout=0&max_messages=100"
```
### Apache Bench
```bash
# Install ab
sudo apt install apache2-utils
# Benchmark
ab -n 10000 -c 10 \
-H "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" \
"http://localhost:9093/api/v1/queues/events/consumers/bench/messages?timeout=0"
```
## Optimization Strategies
### 1. Storage Selection
**NVMe SSD** (Best):
- Highest performance
- Low latency
- Suitable for high-throughput workloads
**SATA SSD**:
- Good performance
- Affordable
- Suitable for most workloads
**HDD** (Not Recommended):
- Poor random I/O performance
- Acceptable only for low-throughput workloads
- Consider for archival/backup
**Recommendation**: Use SSD for data directory.
### 2. File System Selection
**ext4** (Recommended):
- Good balance of performance and reliability
- Mature and stable
- Good fsync performance
**XFS**:
- Excellent for large files
- Good for high-concurrency workloads
**btrfs**:
- Advanced features (snapshots, compression)
- May have fsync performance issues
**Recommendation**: Use ext4 or XFS.
### 3. Mount Options
Optimize mount options for performance:
```bash
# /etc/fstab
/dev/nvme0n1 /var/lib/vqueue ext4 noatime,nodiratime,data=writeback 0 2
```
**Options**:
- `noatime` - Don't update access time (reduces writes)
- `nodiratime` - Don't update directory access time
- `data=writeback` - Faster but less safe (use with caution)
**For production** (safer):
```bash
/dev/nvme0n1 /var/lib/vqueue ext4 noatime,nodiratime,data=ordered 0 2
```
### 4. Batch Size Tuning
Larger batches improve throughput but increase latency.
**Small Batches** (`max_messages=10`):
- Low latency
- More frequent commits
- More HTTP requests
**Large Batches** (`max_messages=1000`):
- High throughput
- Fewer HTTP requests
- Higher memory usage
**Recommendation**:
- Real-time processing: 10-50
- Batch processing: 100-1000
- Bulk processing: 1000-5000
### 5. Timeout Tuning
Balance between latency and efficiency.
**Short Timeout** (`timeout_ms=0-5000`):
- Lower latency
- More "empty" responses
- Higher CPU usage
**Long Timeout** (`timeout_ms=30000-60000`):
- Efficient (waits for messages)
- Higher latency
- Lower CPU usage
**Recommendation**:
- Real-time: 5-10 seconds
- Background processing: 30 seconds
- Batch jobs: 0 (poll immediately)
### 6. Message Size Optimization
**Small Messages** (< 100 bytes):
- Higher overhead (25 byte header)
- More messages/second
- Less throughput (MB/s)
**Large Messages** (> 1KB):
- Lower overhead percentage
- Fewer messages/second
- Higher throughput (MB/s)
**Recommendation**:
- Batch small events into larger messages if possible
- Use compression for large messages
- Target 1KB - 10KB per message for best balance
### 7. Concurrent Consumers
Multiple consumers can read in parallel.
**Benefits**:
- Parallel processing
- Higher total throughput
- Better resource utilization
**Considerations**:
- Each consumer tracks own offset
- No coordination between consumers
- All consumers see all messages
**Example** - 3 consumers:
```python
# Consumer 1
messages = consumer.consume("events", "consumer-1")
# Consumer 2 (different consumer, same queue)
messages = consumer.consume("events", "consumer-2")
# Consumer 3
messages = consumer.consume("events", "consumer-3")
```
Each processes all messages independently.
### 8. Disable Authentication (Development Only)
For maximum performance in development:
```bash
v-queue-server --no-auth
```
Removes authentication overhead (~5-10% performance gain).
**Warning**: Only use in isolated development environments.
### 9. Logging Level
Reduce logging overhead in production:
```toml
log_level = "warn" # or "error"
```
Debug logging can significantly impact performance.
### 10. Resource Limits
Increase OS limits for high-throughput scenarios:
```bash
# /etc/security/limits.conf
vqueue soft nofile 65536
vqueue hard nofile 65536
# /etc/sysctl.conf
fs.file-max = 2097152
net.core.somaxconn = 1024
```
Apply:
```bash
sudo sysctl -p
```
## Monitoring Performance
### System Metrics
Monitor these metrics:
**Disk I/O**:
```bash
iostat -x 1
```
Look for:
- `%util` - Disk utilization (< 80% ideal)
- `await` - Average wait time (< 10ms for SSD)
- `r/s`, `w/s` - Reads/writes per second
**CPU Usage**:
```bash
top -p $(pgrep v-queue-server)
```
**Memory Usage**:
```bash
**Network**:
```bash
iftop
```
### Application Metrics
Track in your application:
```python
import time
# Measure consume latency
start = time.time()
messages = consumer.consume("events", "my-consumer", timeout=30)
consume_latency = time.time() - start
# Measure processing time
start = time.time()
for msg in messages:
process_message(msg)
process_time = time.time() - start
# Measure commit latency
start = time.time()
consumer.commit("events", "my-consumer")
commit_latency = time.time() - start
print(f"Consume: {consume_latency:.3f}s")
print(f"Process: {process_time:.3f}s")
print(f"Commit: {commit_latency:.3f}s")
print(f"Total: {consume_latency + process_time + commit_latency:.3f}s")
print(f"Throughput: {len(messages)/(consume_latency + process_time + commit_latency):.2f} msg/s")
```
### Queue Growth Monitoring
Monitor queue directory size:
```bash
# Watch queue directory size
watch -n 5 'du -sh /var/lib/vqueue/queues/*'
# Track specific queue growth
watch -n 1 'ls -lh /var/lib/vqueue/queues/events-0/events_queue'
```
### Prometheus Metrics (Future Enhancement)
Currently not implemented, but consider adding:
- Messages produced/consumed per second
- Consumer lag (messages behind)
- Request latency histogram
- Error rates
- Queue sizes
## Performance Issues and Solutions
### Issue: Slow Writes
**Symptoms**: Low messages/second write rate
**Causes**:
- Slow disk (HDD)
- Filesystem issues
- Resource contention
**Solutions**:
- Upgrade to SSD
- Check `iostat` for disk bottleneck
- Optimize filesystem mount options
- Reduce fsync frequency (if acceptable)
### Issue: Slow Reads
**Symptoms**: High latency on consume requests
**Causes**:
- Small batch sizes
- Disk I/O bottleneck
- Many concurrent consumers
**Solutions**:
- Increase `max_messages` parameter
- Use SSD storage
- Optimize batch processing
- Reduce consumer count if not needed
### Issue: High CPU Usage
**Symptoms**: v-queue-server using excessive CPU
**Causes**:
- Many concurrent requests
- Small timeouts causing polling
- Debug logging enabled
**Solutions**:
- Reduce log level to `warn` or `error`
- Increase timeout values
- Scale horizontally (multiple queues)
### Issue: High Memory Usage
**Symptoms**: Increasing memory consumption
**Causes**:
- Large batch sizes
- Memory leak (file handles)
- Many consumers
**Solutions**:
- Reduce `max_messages`
- Check file descriptor count: `lsof -p $(pgrep v-queue-server) | wc -l`
- Monitor with `valgrind` (development)
### Issue: Queue Growing Unbounded
**Symptoms**: Disk usage continuously increasing
**Causes**:
- Consumers not keeping up with producers
- Consumers not committing
- Dead/stalled consumers
**Solutions**:
- Add more consumers
- Increase batch sizes
- Implement queue retention policy
- Monitor consumer lag
## Capacity Planning
### Estimating Storage Requirements
```
Daily Storage = messages/day × avg_message_size × (1 + overhead)
overhead = 25 bytes / avg_message_size
```
**Example**:
- 10 million messages/day
- 500 bytes average message size
- Overhead: 25/500 = 5%
```
Daily Storage = 10,000,000 × 500 × 1.05
= 5.25 GB/day
≈ 158 GB/month
```
### Estimating Throughput Requirements
```
Peak throughput = peak_messages/second × avg_message_size
```
**Example**:
- 1,000 messages/second peak
- 1KB average message
```
Peak throughput = 1,000 × 1KB = 1 MB/s
```
This is easily handled by any SSD.
### Sizing Recommendations
**Small Deployment** (< 1M messages/day):
- CPU: 2 cores
- RAM: 2 GB
- Disk: 100 GB SSD
- Network: 100 Mbps
**Medium Deployment** (1M - 100M messages/day):
- CPU: 4 cores
- RAM: 4 GB
- Disk: 500 GB SSD
- Network: 1 Gbps
**Large Deployment** (> 100M messages/day):
- CPU: 8+ cores
- RAM: 8+ GB
- Disk: 1+ TB NVMe SSD
- Network: 10 Gbps
## Best Practices Summary
1. **Use SSD storage** for best performance
2. **Tune batch sizes** for your workload (100-1000 typical)
3. **Use appropriate timeouts** (30 seconds for background processing)
4. **Monitor disk I/O** as primary bottleneck
5. **Reduce logging** in production (`warn` or `error` level)
6. **Disable authentication** only in development
7. **Plan storage capacity** based on message volume
8. **Use ext4 or XFS** filesystem
9. **Optimize mount options** (`noatime`, `nodiratime`)
10. **Monitor queue growth** to prevent disk exhaustion
## Next Steps
- [Configuration Guide](04-configuration.md)
- [API Reference](05-api-reference.md)
- [Troubleshooting](09-troubleshooting.md)