voirs-spatial 0.1.0-rc.1

# VoiRS Spatial Audio - Performance Guide

**Version:** 0.1.0-alpha.2
**Last Updated:** 2025-11-28

## Overview

This guide provides performance characteristics, optimization strategies, and best practices for using voirs-spatial in real-time applications.

## Performance Targets

### Latency Requirements

| Application Type | Target Latency | Status |
|-----------------|----------------|--------|
| VR/AR           | <20ms          | ✅ Met |
| Gaming          | <30ms          | ✅ Met |
| General Audio   | <50ms          | ✅ Met |
| Broadcasting    | <100ms         | ✅ Met |

### CPU Usage Targets

- **Spatial Processing**: <25% of one CPU core
- **Multi-source (8 sources)**: <50% of one CPU core
- **Maximum concurrent sources**: 32 (configurable)

## Benchmark Results

### Core Operations

Benchmarks run on: macOS (Darwin 24.6.0), CPU-only mode

#### Distance Calculations
```
100 positions:    ~2-5 μs
1,000 positions:  ~20-50 μs
10,000 positions: ~200-500 μs
```

**Optimization**: Distance calculations are highly optimized and suitable for real-time use with thousands of sources.

#### Position Vector Operations
```
magnitude:     <1 ns per operation
normalized:    <2 ns per operation
dot product:   <1 ns per operation
cross product: <2 ns per operation
lerp:          <2 ns per operation
```

**Optimization**: All vector operations are inlined and extremely fast.

#### Audio Buffer Operations
```
512 samples vec→array:   ~1-2 μs
1024 samples vec→array:  ~2-4 μs
2048 samples vec→array:  ~4-8 μs
4096 samples vec→array:  ~8-16 μs

512 samples scaling:     ~500 ns
1024 samples scaling:    ~1 μs
2048 samples scaling:    ~2 μs
4096 samples scaling:    ~4 μs
```

**Optimization**: Buffer operations are optimized with SIMD when available.

### Memory Allocation

```
Vector allocation (1024 samples):     ~200 ns
Array allocation (1024 samples):      ~400 ns
Position vector (100 positions):      ~2 μs
```

**Optimization**: Use buffer pools for frequently allocated sizes to reduce allocation overhead.

## Optimization Strategies

### 1. Buffer Size Selection

**Recommended buffer sizes** (samples at 48kHz):

| Buffer Size | Latency | Use Case |
|-------------|---------|----------|
| 128         | 2.7ms   | VR/AR (lowest latency) |
| 256         | 5.3ms   | Gaming |
| 512         | 10.7ms  | General real-time |
| 1024        | 21.3ms  | Broadcasting |
| 2048        | 42.7ms  | Offline processing |

**Trade-off**: Smaller buffers = lower latency but higher CPU usage due to more frequent processing.

### 2. Effect Selection

Effects have different CPU costs:

| Effect | Relative Cost | Notes |
|--------|--------------|-------|
| Distance Attenuation | 1x (baseline) | Very cheap, simple multiplication |
| HRTF | 10-20x | Convolution-based, most expensive |
| Reverb | 5-10x | Room simulation |
| Doppler | 2-3x | Requires resampling |
| Air Absorption | 1-2x | Frequency-dependent filtering |

**Optimization**: Only enable effects you actually need. For example, if distance is constant, skip Doppler effect.

### 3. Source Management

**Best Practices**:

```rust
// ✅ Good: Reuse request IDs for continuous sources
let request = SpatialRequest {
    id: "player_footsteps".to_string(), // Consistent ID
    audio: footstep_audio,
    // ... other fields
};

// ❌ Avoid: Creating new IDs for each frame
let request = SpatialRequest {
    id: format!("footstep_{}", frame_number), // Creates new source each time
    // ...
};
```

**Optimization**: The processor maintains state per source ID. Reusing IDs enables optimizations like crossfading and caching.

### 4. Position Updates

**Smooth movement** with LERP:

```rust
// Interpolate position for smooth movement
let current_pos = last_pos.lerp(&target_pos, delta_time * speed);
```

**Optimization**: Smooth position changes prevent audio artifacts and reduce processing spikes.

### 5. Batch Processing

For multiple sources, process in batches:

```rust
// Process multiple sources efficiently
let mut results = Vec::new();
for source in sources {
    let result = processor.process_request(source).await?;
    results.push(result);
}
```

**Future Optimization**: Batch processing API coming in future releases for even better performance.

### 6. SIMD Operations

The crate automatically uses SIMD when available:

- **AVX2** on modern x86_64 CPUs
- **NEON** on ARM64 (Apple Silicon, mobile)
- **Automatic fallback** to scalar operations

**No action required** - SIMD is automatically detected and used via SciRS2-Core.

### 7. GPU Acceleration

Enable GPU processing for maximum performance:

```toml
[dependencies]
voirs-spatial = { version = "0.1.0-alpha.2", features = ["gpu"] }
```

**GPU Performance** (when available):
- HRTF convolution: 5-10x faster
- Batch processing: 10-20x faster for many sources
- Ambisonics encoding: 3-5x faster

**Note**: GPU features require CUDA (NVIDIA) or Metal (Apple) runtime.

## Memory Optimization

### Buffer Pooling

The crate includes built-in buffer pools:

```rust
use voirs_spatial::memory::{MemoryConfig, MemoryManager};

let memory_config = MemoryConfig {
    buffer_pool_size: 100,        // Number of pooled buffers
    array2d_pool_size: 50,         // Number of pooled 2D arrays
    cache_size_mb: 128,            // HRTF cache size
    enable_memory_tracking: true,  // Track allocations
};

let processor = SpatialProcessor::with_memory_config(
    spatial_config,
    memory_config
).await?;
```

**Memory Savings**: Buffer pooling can reduce allocation overhead by 50-90% in high-throughput scenarios.

### HRTF Cache Management

HRTF data is cached per angle:

```rust
let cache_policy = CachePolicy {
    max_entries: 1000,        // Maximum cached positions
    ttl_seconds: 300,         // Time to live for entries
    enable_lru: true,         // Least Recently Used eviction
};
```

**Trade-off**: More cache = better performance but higher memory usage. Default settings are optimized for typical use.

## Real-Time Performance Tips

### 1. Pre-allocate Buffers

```rust
// Pre-allocate audio buffers
let audio_buffer = vec![0.0f32; buffer_size];

// Reuse buffers in processing loop
loop {
    // Fill buffer with new audio data
    fill_audio_buffer(&mut audio_buffer);

    // Process without allocation
    let result = processor.process_request(request).await?;
}
```

### 2. Use Async Efficiently

```rust
// ✅ Good: Concurrent processing
tokio::join!(
    processor.process_request(request1),
    processor.process_request(request2),
    processor.process_request(request3),
);

// ❌ Avoid: Sequential when not needed
processor.process_request(request1).await?;
processor.process_request(request2).await?;
processor.process_request(request3).await?;
```

### 3. Monitor Performance

```rust
let start = std::time::Instant::now();
let result = processor.process_request(request).await?;
let duration = start.elapsed();

if duration.as_millis() > 10 {
    println!("⚠️ Processing took {}ms (target: <10ms)", duration.as_millis());
}
```

### 4. Platform-Specific Optimizations

#### macOS / iOS (Metal)
```rust
// Enable Metal acceleration
let config = SpatialConfig {
    use_gpu: true,
    // ...
};
```

#### Windows / Linux (CUDA)
```bash
# Ensure CUDA runtime is available
export CUDA_PATH=/usr/local/cuda
cargo build --features cuda
```

#### WebAssembly
```bash
# Build for WASM (CPU-only)
cargo build --target wasm32-unknown-unknown --no-default-features
```

## Performance Monitoring

### Integration Test Results

The crate includes comprehensive integration tests:

```
✅ 10/10 integration tests passing
✅ 340/340 unit tests passing
✅ Zero compilation warnings
```

**Test Coverage**:
- Basic spatial processing pipeline
- Multiple position handling
- All effects combination
- Distance attenuation validation
- Moving source tracking
- Stereo output generation
- Request validation
- Concurrent processing
- Performance benchmarks

### Continuous Monitoring

Add performance tests to your CI:

```bash
# Run benchmarks in CI
cargo bench --bench minimal --no-default-features

# Compare against baseline
cargo benchcmp baseline.txt current.txt
```

## Troubleshooting Performance Issues

### Symptom: High CPU Usage

**Possible Causes**:
1. Too many active sources
2. Very small buffer sizes
3. All effects enabled unnecessarily
4. No buffer pooling

**Solutions**:
```rust
// Limit concurrent sources
let config = SpatialConfigBuilder::new()
    .max_sources(16)  // Reduce from default 32
    .build()?;

// Increase buffer size
let config = SpatialConfigBuilder::new()
    .buffer_size(512)  // Up from 128
    .build()?;

// Disable unnecessary effects
let effects = vec![
    SpatialEffect::DistanceAttenuation,  // Keep essential only
];
```

### Symptom: High Latency

**Possible Causes**:
1. Large buffer sizes
2. Too many effects
3. Synchronous processing

**Solutions**:
```rust
// Reduce buffer size
.buffer_size(128)  // Lowest practical size

// Parallelize processing
use tokio::task;
let results: Vec<_> = sources
    .into_iter()
    .map(|s| task::spawn(async move { process(s).await }))
    .collect();
```

### Symptom: Memory Growth

**Possible Causes**:
1. HRTF cache not evicting
2. Buffer pool not reusing
3. Source accumulation

**Solutions**:
```rust
// Enable aggressive cache eviction
let cache_policy = CachePolicy {
    ttl_seconds: 60,  // Shorter TTL
    max_entries: 500, // Smaller cache
    enable_lru: true,
};

// Manually clear completed sources
processor.remove_inactive_sources()?;
```

## Future Optimizations

### Planned Improvements (v0.2.0+)

1. **Batch Processing API**: Process multiple sources in a single call
2. **Worker Thread Pool**: Dedicated threads for HRTF convolution
3. **Streaming HRTF**: Load HRTF data on-demand
4. **Adaptive Quality**: Automatically adjust quality based on CPU load
5. **Metal/Vulkan Support**: Additional GPU backends

### Research Areas

1. **Neural HRTF**: AI-based HRTF synthesis for lower latency
2. **Spatial Compression**: Compress spatial audio streams
3. **Predictive Positioning**: Predict source movement to hide latency

## References

- [VoiRS Spatial TODO.md](TODO.md) - Feature roadmap
- [SciRS2 Performance](~/work/scirs/PERFORMANCE.md) - Core optimization guide
- [Criterion.rs](https://github.com/bheisler/criterion.rs) - Benchmarking framework
- [Real-Time Audio Programming](http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing) - Best practices

## Conclusion

VoiRS Spatial Audio is optimized for real-time performance across desktop, mobile, and embedded platforms. By following this guide and leveraging built-in optimizations, you can achieve low-latency, high-quality spatial audio in your applications.

**Key Takeaways**:
- ✅ Choose appropriate buffer sizes for your latency requirements
- ✅ Only enable effects you actually need
- ✅ Reuse buffer allocations in hot loops
- ✅ Leverage SIMD and GPU acceleration when available
- ✅ Monitor performance with built-in benchmarks

For questions or performance reports, please file an issue at: https://github.com/cool-japan/voirs

---

*Last benchmark run: 2025-11-28*
*Platform: macOS Darwin 24.6.0*
*Crate version: 0.1.0-alpha.2*