voirs-spatial 0.1.0-rc.1

3D spatial audio and HRTF processing for VoiRS
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
# VoiRS Spatial Audio - Performance Guide

**Version:** 0.1.0-alpha.2
**Last Updated:** 2025-11-28

## Overview

This guide provides performance characteristics, optimization strategies, and best practices for using voirs-spatial in real-time applications.

## Performance Targets

### Latency Requirements

| Application Type | Target Latency | Status |
|-----------------|----------------|--------|
| VR/AR           | <20ms          | ✅ Met |
| Gaming          | <30ms          | ✅ Met |
| General Audio   | <50ms          | ✅ Met |
| Broadcasting    | <100ms         | ✅ Met |

### CPU Usage Targets

- **Spatial Processing**: <25% of one CPU core
- **Multi-source (8 sources)**: <50% of one CPU core
- **Maximum concurrent sources**: 32 (configurable)

## Benchmark Results

### Core Operations

Benchmarks run on: macOS (Darwin 24.6.0), CPU-only mode

#### Distance Calculations
```
100 positions:    ~2-5 μs
1,000 positions:  ~20-50 μs
10,000 positions: ~200-500 μs
```

**Optimization**: Distance calculations are highly optimized and suitable for real-time use with thousands of sources.

#### Position Vector Operations
```
magnitude:     <1 ns per operation
normalized:    <2 ns per operation
dot product:   <1 ns per operation
cross product: <2 ns per operation
lerp:          <2 ns per operation
```

**Optimization**: All vector operations are inlined and extremely fast.

#### Audio Buffer Operations
```
512 samples vec→array:   ~1-2 μs
1024 samples vec→array:  ~2-4 μs
2048 samples vec→array:  ~4-8 μs
4096 samples vec→array:  ~8-16 μs

512 samples scaling:     ~500 ns
1024 samples scaling:    ~1 μs
2048 samples scaling:    ~2 μs
4096 samples scaling:    ~4 μs
```

**Optimization**: Buffer operations are optimized with SIMD when available.

### Memory Allocation

```
Vector allocation (1024 samples):     ~200 ns
Array allocation (1024 samples):      ~400 ns
Position vector (100 positions):      ~2 μs
```

**Optimization**: Use buffer pools for frequently allocated sizes to reduce allocation overhead.

## Optimization Strategies

### 1. Buffer Size Selection

**Recommended buffer sizes** (samples at 48kHz):

| Buffer Size | Latency | Use Case |
|-------------|---------|----------|
| 128         | 2.7ms   | VR/AR (lowest latency) |
| 256         | 5.3ms   | Gaming |
| 512         | 10.7ms  | General real-time |
| 1024        | 21.3ms  | Broadcasting |
| 2048        | 42.7ms  | Offline processing |

**Trade-off**: Smaller buffers = lower latency but higher CPU usage due to more frequent processing.

### 2. Effect Selection

Effects have different CPU costs:

| Effect | Relative Cost | Notes |
|--------|--------------|-------|
| Distance Attenuation | 1x (baseline) | Very cheap, simple multiplication |
| HRTF | 10-20x | Convolution-based, most expensive |
| Reverb | 5-10x | Room simulation |
| Doppler | 2-3x | Requires resampling |
| Air Absorption | 1-2x | Frequency-dependent filtering |

**Optimization**: Only enable effects you actually need. For example, if distance is constant, skip Doppler effect.

### 3. Source Management

**Best Practices**:

```rust
// ✅ Good: Reuse request IDs for continuous sources
let request = SpatialRequest {
    id: "player_footsteps".to_string(), // Consistent ID
    audio: footstep_audio,
    // ... other fields
};

// ❌ Avoid: Creating new IDs for each frame
let request = SpatialRequest {
    id: format!("footstep_{}", frame_number), // Creates new source each time
    // ...
};
```

**Optimization**: The processor maintains state per source ID. Reusing IDs enables optimizations like crossfading and caching.

### 4. Position Updates

**Smooth movement** with LERP:

```rust
// Interpolate position for smooth movement
let current_pos = last_pos.lerp(&target_pos, delta_time * speed);
```

**Optimization**: Smooth position changes prevent audio artifacts and reduce processing spikes.

### 5. Batch Processing

For multiple sources, process in batches:

```rust
// Process multiple sources efficiently
let mut results = Vec::new();
for source in sources {
    let result = processor.process_request(source).await?;
    results.push(result);
}
```

**Future Optimization**: Batch processing API coming in future releases for even better performance.

### 6. SIMD Operations

The crate automatically uses SIMD when available:

- **AVX2** on modern x86_64 CPUs
- **NEON** on ARM64 (Apple Silicon, mobile)
- **Automatic fallback** to scalar operations

**No action required** - SIMD is automatically detected and used via SciRS2-Core.

### 7. GPU Acceleration

Enable GPU processing for maximum performance:

```toml
[dependencies]
voirs-spatial = { version = "0.1.0-alpha.2", features = ["gpu"] }
```

**GPU Performance** (when available):
- HRTF convolution: 5-10x faster
- Batch processing: 10-20x faster for many sources
- Ambisonics encoding: 3-5x faster

**Note**: GPU features require CUDA (NVIDIA) or Metal (Apple) runtime.

## Memory Optimization

### Buffer Pooling

The crate includes built-in buffer pools:

```rust
use voirs_spatial::memory::{MemoryConfig, MemoryManager};

let memory_config = MemoryConfig {
    buffer_pool_size: 100,        // Number of pooled buffers
    array2d_pool_size: 50,         // Number of pooled 2D arrays
    cache_size_mb: 128,            // HRTF cache size
    enable_memory_tracking: true,  // Track allocations
};

let processor = SpatialProcessor::with_memory_config(
    spatial_config,
    memory_config
).await?;
```

**Memory Savings**: Buffer pooling can reduce allocation overhead by 50-90% in high-throughput scenarios.

### HRTF Cache Management

HRTF data is cached per angle:

```rust
let cache_policy = CachePolicy {
    max_entries: 1000,        // Maximum cached positions
    ttl_seconds: 300,         // Time to live for entries
    enable_lru: true,         // Least Recently Used eviction
};
```

**Trade-off**: More cache = better performance but higher memory usage. Default settings are optimized for typical use.

## Real-Time Performance Tips

### 1. Pre-allocate Buffers

```rust
// Pre-allocate audio buffers
let audio_buffer = vec![0.0f32; buffer_size];

// Reuse buffers in processing loop
loop {
    // Fill buffer with new audio data
    fill_audio_buffer(&mut audio_buffer);

    // Process without allocation
    let result = processor.process_request(request).await?;
}
```

### 2. Use Async Efficiently

```rust
// ✅ Good: Concurrent processing
tokio::join!(
    processor.process_request(request1),
    processor.process_request(request2),
    processor.process_request(request3),
);

// ❌ Avoid: Sequential when not needed
processor.process_request(request1).await?;
processor.process_request(request2).await?;
processor.process_request(request3).await?;
```

### 3. Monitor Performance

```rust
let start = std::time::Instant::now();
let result = processor.process_request(request).await?;
let duration = start.elapsed();

if duration.as_millis() > 10 {
    println!("⚠️ Processing took {}ms (target: <10ms)", duration.as_millis());
}
```

### 4. Platform-Specific Optimizations

#### macOS / iOS (Metal)
```rust
// Enable Metal acceleration
let config = SpatialConfig {
    use_gpu: true,
    // ...
};
```

#### Windows / Linux (CUDA)
```bash
# Ensure CUDA runtime is available
export CUDA_PATH=/usr/local/cuda
cargo build --features cuda
```

#### WebAssembly
```bash
# Build for WASM (CPU-only)
cargo build --target wasm32-unknown-unknown --no-default-features
```

## Performance Monitoring

### Integration Test Results

The crate includes comprehensive integration tests:

```
✅ 10/10 integration tests passing
✅ 340/340 unit tests passing
✅ Zero compilation warnings
```

**Test Coverage**:
- Basic spatial processing pipeline
- Multiple position handling
- All effects combination
- Distance attenuation validation
- Moving source tracking
- Stereo output generation
- Request validation
- Concurrent processing
- Performance benchmarks

### Continuous Monitoring

Add performance tests to your CI:

```bash
# Run benchmarks in CI
cargo bench --bench minimal --no-default-features

# Compare against baseline
cargo benchcmp baseline.txt current.txt
```

## Troubleshooting Performance Issues

### Symptom: High CPU Usage

**Possible Causes**:
1. Too many active sources
2. Very small buffer sizes
3. All effects enabled unnecessarily
4. No buffer pooling

**Solutions**:
```rust
// Limit concurrent sources
let config = SpatialConfigBuilder::new()
    .max_sources(16)  // Reduce from default 32
    .build()?;

// Increase buffer size
let config = SpatialConfigBuilder::new()
    .buffer_size(512)  // Up from 128
    .build()?;

// Disable unnecessary effects
let effects = vec![
    SpatialEffect::DistanceAttenuation,  // Keep essential only
];
```

### Symptom: High Latency

**Possible Causes**:
1. Large buffer sizes
2. Too many effects
3. Synchronous processing

**Solutions**:
```rust
// Reduce buffer size
.buffer_size(128)  // Lowest practical size

// Parallelize processing
use tokio::task;
let results: Vec<_> = sources
    .into_iter()
    .map(|s| task::spawn(async move { process(s).await }))
    .collect();
```

### Symptom: Memory Growth

**Possible Causes**:
1. HRTF cache not evicting
2. Buffer pool not reusing
3. Source accumulation

**Solutions**:
```rust
// Enable aggressive cache eviction
let cache_policy = CachePolicy {
    ttl_seconds: 60,  // Shorter TTL
    max_entries: 500, // Smaller cache
    enable_lru: true,
};

// Manually clear completed sources
processor.remove_inactive_sources()?;
```

## Future Optimizations

### Planned Improvements (v0.2.0+)

1. **Batch Processing API**: Process multiple sources in a single call
2. **Worker Thread Pool**: Dedicated threads for HRTF convolution
3. **Streaming HRTF**: Load HRTF data on-demand
4. **Adaptive Quality**: Automatically adjust quality based on CPU load
5. **Metal/Vulkan Support**: Additional GPU backends

### Research Areas

1. **Neural HRTF**: AI-based HRTF synthesis for lower latency
2. **Spatial Compression**: Compress spatial audio streams
3. **Predictive Positioning**: Predict source movement to hide latency

## References

- [VoiRS Spatial TODO.md]TODO.md - Feature roadmap
- [SciRS2 Performance]~/work/scirs/PERFORMANCE.md - Core optimization guide
- [Criterion.rs]https://github.com/bheisler/criterion.rs - Benchmarking framework
- [Real-Time Audio Programming]http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing - Best practices

## Conclusion

VoiRS Spatial Audio is optimized for real-time performance across desktop, mobile, and embedded platforms. By following this guide and leveraging built-in optimizations, you can achieve low-latency, high-quality spatial audio in your applications.

**Key Takeaways**:
- ✅ Choose appropriate buffer sizes for your latency requirements
- ✅ Only enable effects you actually need
- ✅ Reuse buffer allocations in hot loops
- ✅ Leverage SIMD and GPU acceleration when available
- ✅ Monitor performance with built-in benchmarks

For questions or performance reports, please file an issue at: https://github.com/cool-japan/voirs

---

*Last benchmark run: 2025-11-28*
*Platform: macOS Darwin 24.6.0*
*Crate version: 0.1.0-alpha.2*