vectorlite 0.1.0

A high-performance, in-memory vector database optimized for AI agent workloads
# VectorLite Benchmarking Guide

This document describes the comprehensive benchmarking system for VectorLite embedding generation performance.

## Overview

The benchmarking system provides detailed performance analysis for:
- Single embedding generation latency
- Batch processing throughput
- Memory usage patterns
- Text length impact on performance
- Model initialization time
- Consistency validation

## Quick Start

### Run All Benchmarks
```bash
# Run comprehensive benchmark suite
cargo bench --bench embedding_benchmarks

# Run with HTML reports
cargo bench --bench embedding_benchmarks -- --output-format html
```

### Run Specific Benchmarks
```bash
# Single embedding generation only
cargo bench --bench embedding_benchmarks single_embedding_generation

# Batch processing only
cargo bench --bench embedding_benchmarks batch_embedding_generation

# Text length impact analysis
cargo bench --bench embedding_benchmarks text_length_impact
```

### Quick Performance Analysis
```bash
# Run performance analysis tool
cargo run --example performance_analysis
```

### Automated Benchmark Suite
```bash
# Run the complete benchmark suite with reports
./scripts/run_benchmarks.sh
```

## Benchmark Categories

### 1. Single Embedding Generation
**Purpose**: Measure latency for individual text processing
- **Test Data**: 10 different text samples of varying lengths
- **Metrics**: Average, min, max, and total processing time
- **Sample Size**: 100 iterations per test
- **Use Case**: Real-time applications, single queries

### 2. Batch Embedding Generation
**Purpose**: Evaluate throughput for multiple texts processed together
- **Batch Sizes**: 1, 5, 10, 20, 50 texts
- **Metrics**: Throughput (embeddings/second), total processing time
- **Sample Size**: 50 iterations per test
- **Use Case**: Bulk processing, data pipelines

### 3. Text Length Impact Analysis
**Purpose**: Understand how input text length affects performance
- **Text Categories**:
  - Short: "AI" (2 characters)
  - Medium: ~100 characters
  - Long: ~500 characters
  - Very Long: ~1000+ characters
- **Metrics**: Processing time vs. text length correlation
- **Use Case**: Optimizing for different input types

### 4. Memory Usage Analysis
**Purpose**: Track memory consumption during embedding generation
- **Tests**: Single and batch operations
- **Metrics**: Memory allocation patterns, peak usage
- **Use Case**: Memory optimization, resource planning

### 5. Initialization Performance
**Purpose**: Measure model loading and setup time
- **Metrics**: Cold start time, model loading overhead
- **Use Case**: Application startup optimization

### 6. Consistency Validation
**Purpose**: Ensure reproducible results
- **Tests**: Multiple runs with identical input
- **Metrics**: Result consistency, deterministic behavior
- **Use Case**: Quality assurance, debugging

## Understanding Results

### Performance Metrics

- **Average Time**: Mean processing time per embedding
- **Min/Max Time**: Best and worst case performance
- **Throughput**: Embeddings processed per second
- **Memory Usage**: Peak memory consumption
- **Consistency**: Result reproducibility

### Performance Targets

| Metric | Target | Good | Excellent |
|--------|--------|------|-----------|
| Single Embedding | < 500ms | < 200ms | < 100ms |
| Batch Throughput | > 10/sec | > 50/sec | > 100/sec |
| Initialization | < 5s | < 2s | < 1s |
| Memory Usage | < 1GB | < 500MB | < 200MB |

### Interpreting Results

1. **High Single Embedding Latency**: Consider batch processing or model optimization
2. **Low Batch Throughput**: Check for memory bottlenecks or inefficient batching
3. **High Memory Usage**: Look for memory leaks or inefficient data structures
4. **Inconsistent Results**: Check for non-deterministic operations or race conditions

## Optimization Strategies

### Based on Benchmark Results

1. **For High Latency**:
   - Use batch processing for multiple texts
   - Implement model caching
   - Consider smaller/faster models
   - Optimize tokenization pipeline

2. **For Low Throughput**:
   - Increase batch sizes
   - Implement parallel processing
   - Use GPU acceleration
   - Optimize memory allocation

3. **For High Memory Usage**:
   - Implement streaming processing
   - Use memory-mapped files
   - Optimize data structures
   - Implement garbage collection

## Advanced Usage

### Custom Benchmark Configuration

```rust
// In benches/embedding_benchmarks.rs
fn custom_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("custom_test");
    group.measurement_time(Duration::from_secs(60));
    group.sample_size(200);
    // ... your benchmark code
}
```

### Performance Regression Testing

```bash
# Save baseline results
cargo bench --bench embedding_benchmarks -- --save-baseline main

# Compare against baseline
cargo bench --bench embedding_benchmarks -- --baseline main
```

### Continuous Benchmarking

```bash
# Set up automated benchmarking in CI/CD
cargo bench --bench embedding_benchmarks -- --output-format json > results.json
```

## Troubleshooting

### Common Issues

1. **Out of Memory**: Reduce batch sizes or sample sizes
2. **Slow Initialization**: Check model loading path and dependencies
3. **Inconsistent Results**: Ensure deterministic random seeds
4. **High Variance**: Increase sample size or check for system interference

### Debug Mode

```bash
# Run with debug information
RUST_LOG=debug cargo bench --bench embedding_benchmarks
```

## Results Location

- **HTML Reports**: `target/criterion/`
- **JSON Data**: `target/criterion/*/new/`
- **Baseline Data**: `target/criterion/*/base/`
- **Custom Reports**: `benchmark_results/`

## Contributing

When adding new benchmarks:

1. Follow the existing naming conventions
2. Include comprehensive documentation
3. Use appropriate sample sizes
4. Test on multiple configurations
5. Update this guide with new metrics

## Future Enhancements

- [ ] GPU acceleration benchmarks
- [ ] Multi-model comparison
- [ ] Memory profiling integration
- [ ] Real-time performance monitoring
- [ ] Automated performance regression detection
- [ ] Cross-platform performance comparison