# VoiRS Ecosystem Integration Guide
> **Guide for integrating voirs-spatial with other VoiRS crates and external systems**
## Table of Contents
- [VoiRS Ecosystem Overview](#voirs-ecosystem-overview)
- [Core Integration Points](#core-integration-points)
- [Integration Patterns](#integration-patterns)
- [External System Integration](#external-system-integration)
- [Advanced Integration](#advanced-integration)
---
## VoiRS Ecosystem Overview
The VoiRS ecosystem consists of modular crates that work together to provide comprehensive speech synthesis and audio processing capabilities:
```
┌────────────────────────────────────────────────────────────┐
│ VoiRS Ecosystem │
├────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ voirs-g2p │ │ voirs-acoustic│ │ voirs-vocoder│ │
│ │ (Phonemes) │──→│ (Mel Spectro) │──→│ (Waveform) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ voirs-spatial (3D Spatial Audio) │ │
│ │ • HRTF Processing • Room Acoustics │ │
│ │ • Binaural Rendering • Multi-user Environments │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ voirs-emotion│ │voirs-cloning │ │ voirs-singing│ │
│ │ (Expressive) │ │(Voice Clone) │ │ (Singing) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ voirs-sdk (Unified API) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
```
---
## Core Integration Points
### 1. Integration with voirs-acoustic
**Use Case:** Spatialize synthesized speech from acoustic models
```rust
use voirs_acoustic::{AcousticModel, VitsModel};
use voirs_spatial::{BinauralRenderer, Position3D};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Step 1: Generate mel spectrogram from acoustic model
let acoustic_model = VitsModel::load("path/to/model").await?;
let mel_spectrogram = acoustic_model.synthesize(phonemes).await?;
// Step 2: Convert mel to waveform using vocoder
let vocoder = HiFiGAN::load("path/to/vocoder").await?;
let mono_audio = vocoder.generate(&mel_spectrogram)?;
// Step 3: Spatialize the audio
let hrtf_db = HrtfDatabase::load_default().await?;
let mut spatial_renderer = BinauralRenderer::new(
BinauralConfig::default(),
Arc::new(hrtf_db)
)?;
// Add the synthesized speech as a spatial source
let source_id = spatial_renderer.add_source(
Position3D::new(1.0, 0.0, 2.0), // Position in 3D space
SourceType::Static
)?;
// Process the mono audio through spatial rendering
spatial_renderer.set_source_audio(source_id, &mono_audio)?;
let binaural_output = spatial_renderer.process_frame()?;
println!("✓ Synthesized speech spatialized successfully");
Ok(())
}
```
### 2. Integration with voirs-emotion
**Use Case:** Add emotional spatial characteristics to synthesized speech
```rust
use voirs_emotion::{EmotionController, EmotionType};
use voirs_spatial::room::{RoomSimulator, WallMaterial};
async fn emotional_spatial_speech() -> Result<(), Box<dyn std::error::Error>> {
// Step 1: Apply emotion to speech
let mut emotion_ctrl = EmotionController::new()?;
emotion_ctrl.set_emotion(EmotionType::Joy, 0.8)?;
let emotional_audio = emotion_ctrl.process(&base_audio)?;
// Step 2: Match room acoustics to emotion
let room_config = match emotion_ctrl.current_emotion() {
EmotionType::Joy => RoomConfig {
dimensions: Position3D::new(15.0, 8.0, 12.0), // Large, open space
wall_material: WallMaterial::Wood, // Warm acoustics
reverb_time: 1.5,
..Default::default()
},
EmotionType::Sadness => RoomConfig {
dimensions: Position3D::new(6.0, 3.0, 5.0), // Small, intimate
wall_material: WallMaterial::Carpet, // Damped
reverb_time: 0.5,
..Default::default()
},
_ => RoomConfig::default(),
};
let mut room = RoomSimulator::new(room_config)?;
let spatialized = room.process(
&emotional_audio,
&source_position,
&listener_position
)?;
Ok(())
}
```
### 3. Integration with voirs-cloning
**Use Case:** Create spatial scenes with cloned voices
```rust
use voirs_cloning::{VoiceCloner, CloningConfig};
use voirs_spatial::multiuser::{MultiuserEnvironment, UserRole};
async fn cloned_voice_multiuser() -> Result<(), Box<dyn std::error::Error>> {
// Step 1: Clone target voice
let cloning_config = CloningConfig::default();
let cloner = VoiceCloner::new(cloning_config)?;
let reference_audio = load_audio("reference.wav")?;
let voice_embedding = cloner.create_embedding(&reference_audio).await?;
// Step 2: Create multi-user environment
let mut multi_env = MultiuserEnvironment::new(MultiuserConfig::default())?;
// Step 3: Add user with cloned voice
let user_id = multi_env.add_user(
"cloned_speaker".to_string(),
UserRole::Speaker,
Position3D::new(2.0, 0.0, 1.0)
)?;
// Step 4: Synthesize and spatialize speech
let synthesized_speech = cloner.synthesize_with_embedding(
&voice_embedding,
"Hello from my cloned voice!"
).await?;
multi_env.set_user_audio(user_id, &synthesized_speech)?;
Ok(())
}
```
### 4. Integration with voirs-recognizer
**Use Case:** Spatial audio feedback for speech recognition
```rust
use voirs_recognizer::{Recognizer, RecognitionConfig};
use voirs_spatial::Position3D;
async fn spatial_recognition_feedback() -> Result<(), Box<dyn std::error::Error>> {
// Setup recognizer
let recognizer = Recognizer::new(RecognitionConfig::default()).await?;
// Setup spatial renderer
let mut spatial = setup_spatial_renderer().await?;
// Process audio input
let input_audio = capture_microphone()?;
// Recognize speech
let recognition_result = recognizer.recognize(&input_audio).await?;
// Provide spatial feedback based on recognition confidence
let feedback_position = match recognition_result.confidence {
c if c > 0.8 => Position3D::new(0.0, 1.0, 1.0), // Above and forward (success)
c if c > 0.5 => Position3D::new(1.0, 0.0, 1.0), // To the right (uncertain)
_ => Position3D::new(-1.0, -0.5, 1.0), // Left and down (error)
};
// Play confirmation sound at appropriate position
let confirmation_sound = generate_confirmation_tone(recognition_result.confidence);
spatial.add_source(feedback_position, SourceType::OneShot)?;
spatial.set_source_audio(0, &confirmation_sound)?;
println!("Recognized: {}", recognition_result.text);
println!("Confidence: {:.1}%", recognition_result.confidence * 100.0);
Ok(())
}
```
---
## Integration Patterns
### Pattern 1: TTS Pipeline with Spatial Output
```rust
use voirs_g2p::G2p;
use voirs_acoustic::AcousticModel;
use voirs_vocoder::Vocoder;
use voirs_spatial::{BinauralRenderer, Position3D};
async fn complete_tts_spatial_pipeline(
text: &str,
voice_position: Position3D
) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
// Step 1: Text → Phonemes
let g2p = G2p::new()?;
let phonemes = g2p.convert(text)?;
// Step 2: Phonemes → Mel Spectrogram
let acoustic = VitsModel::load_default().await?;
let mel = acoustic.synthesize(&phonemes).await?;
// Step 3: Mel → Waveform
let vocoder = HiFiGAN::load_default().await?;
let mono_audio = vocoder.generate(&mel)?;
// Step 4: Mono → Binaural Spatial Audio
let hrtf_db = HrtfDatabase::load_default().await?;
let mut spatial = BinauralRenderer::new(
BinauralConfig::default(),
Arc::new(hrtf_db)
)?;
let source_id = spatial.add_source(voice_position, SourceType::Static)?;
spatial.set_source_audio(source_id, &mono_audio)?;
let binaural_output = spatial.process_frame()?;
// Return stereo output
Ok(binaural_output.interleaved())
}
```
### Pattern 2: Dynamic Scene Management
```rust
use voirs_spatial::{BinauralRenderer, Position3D, SourceType};
use std::collections::HashMap;
struct SpatialScene {
renderer: BinauralRenderer,
sources: HashMap<String, (usize, Position3D)>,
}
impl SpatialScene {
pub fn new() -> Result<Self, Box<dyn std::error::Error>> {
let hrtf_db = HrtfDatabase::load_default().await?;
let renderer = BinauralRenderer::new(
BinauralConfig::default(),
Arc::new(hrtf_db)
)?;
Ok(Self {
renderer,
sources: HashMap::new(),
})
}
pub fn add_tts_source(
&mut self,
name: String,
text: &str,
position: Position3D
) -> Result<(), Box<dyn std::error::Error>> {
// Generate speech from text
let audio = generate_tts_audio(text).await?;
// Add to spatial scene
let source_id = self.renderer.add_source(position, SourceType::Static)?;
self.renderer.set_source_audio(source_id, &audio)?;
self.sources.insert(name, (source_id, position));
Ok(())
}
pub fn move_source(
&mut self,
name: &str,
new_position: Position3D
) -> Result<(), Box<dyn std::error::Error>> {
if let Some((source_id, _)) = self.sources.get_mut(name) {
self.renderer.update_source_position(*source_id, new_position)?;
Ok(())
} else {
Err("Source not found".into())
}
}
pub fn process(&mut self) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
let output = self.renderer.process_frame()?;
Ok(output.interleaved())
}
}
```
### Pattern 3: Adaptive Quality Based on System Load
```rust
use voirs_spatial::{BinauralRenderer, BinauralConfig};
use voirs_acoustic::AcousticModel;
struct AdaptiveRenderer {
spatial: BinauralRenderer,
acoustic: AcousticModel,
current_quality: f32,
}
impl AdaptiveRenderer {
pub fn update_quality_based_on_load(&mut self) -> Result<(), Box<dyn std::error::Error>> {
// Get system metrics
let cpu_usage = get_cpu_usage();
let spatial_latency = self.spatial.get_performance_metrics().avg_processing_time_ms;
// Adjust quality dynamically
let target_quality = if cpu_usage > 80.0 || spatial_latency > 20.0 {
0.6 // Reduce quality under high load
} else if cpu_usage < 50.0 && spatial_latency < 10.0 {
0.9 // Increase quality when resources available
} else {
self.current_quality
};
if (target_quality - self.current_quality).abs() > 0.1 {
self.current_quality = target_quality;
// Update both acoustic and spatial quality
self.acoustic.set_quality_level(target_quality)?;
self.spatial.set_quality_level(target_quality)?;
println!("Adapted quality to {:.1}%", target_quality * 100.0);
}
Ok(())
}
}
```
---
## External System Integration
### Unity Game Engine
```csharp
// Unity C# script
using System.Runtime.InteropServices;
using UnityEngine;
public class VoirsSpatialAudio : MonoBehaviour
{
// Import C API functions
[DllImport("voirs_spatial")]
private static extern IntPtr voirs_spatial_create(SpatialConfig config);
[DllImport("voirs_spatial")]
private static extern int voirs_spatial_add_source(
IntPtr renderer,
Vector3 position,
int sourceType
);
[DllImport("voirs_spatial")]
private static extern void voirs_spatial_process(
IntPtr renderer,
float[] output,
int outputLength
);
private IntPtr spatialRenderer;
void Start()
{
// Initialize VoiRS Spatial from Unity
SpatialConfig config = new SpatialConfig {
sampleRate = 48000,
bufferSize = 512,
maxSources = 16
};
spatialRenderer = voirs_spatial_create(config);
}
public int AddAudioSource(Vector3 position)
{
return voirs_spatial_add_source(
spatialRenderer,
position,
0 // SourceType::Static
);
}
void OnAudioFilterRead(float[] data, int channels)
{
// Process spatial audio in Unity audio thread
voirs_spatial_process(spatialRenderer, data, data.Length);
}
void OnDestroy()
{
// Cleanup
voirs_spatial_destroy(spatialRenderer);
}
}
```
### Unreal Engine
```cpp
// Unreal Engine C++ integration
#include "VoirsSpatialAudio.h"
class FVoirsSpatialAudioModule : public IModuleInterface
{
public:
virtual void StartupModule() override
{
// Initialize VoiRS Spatial
SpatialConfig Config;
Config.SampleRate = 48000;
Config.BufferSize = 512;
Config.MaxSources = 16;
SpatialRenderer = voirs_spatial_create(Config);
}
virtual void ShutdownModule() override
{
voirs_spatial_destroy(SpatialRenderer);
}
int32 AddSpatialSource(FVector Position, uint8 SourceType)
{
return voirs_spatial_add_source(
SpatialRenderer,
Position,
SourceType
);
}
void ProcessAudio(TArray<float>& OutputBuffer)
{
voirs_spatial_process(
SpatialRenderer,
OutputBuffer.GetData(),
OutputBuffer.Num()
);
}
private:
void* SpatialRenderer;
};
```
### Web Integration (WebAssembly)
```javascript
// JavaScript/WebAssembly integration
import init, { BinauralRenderer } from './voirs_spatial.js';
async function setupSpatialAudio() {
// Initialize WASM module
await init();
// Create spatial renderer
const renderer = new BinauralRenderer({
sampleRate: 48000,
bufferSize: 512,
maxSources: 8
});
// Add TTS source at specific position
const sourceId = renderer.addSource(
{ x: 1.0, y: 0.0, z: 2.0 },
'Static'
);
// Connect to Web Audio API
const audioContext = new AudioContext();
const scriptNode = audioContext.createScriptProcessor(512, 0, 2);
scriptNode.onaudioprocess = (event) => {
const output = renderer.processFrame();
const left = event.outputBuffer.getChannelData(0);
const right = event.outputBuffer.getChannelData(1);
for (let i = 0; i < output.left.length; i++) {
left[i] = output.left[i];
right[i] = output.right[i];
}
};
scriptNode.connect(audioContext.destination);
}
```
---
## Advanced Integration
### Custom Acoustic Model Integration
```rust
use voirs_spatial::{BinauralRenderer, Position3D};
// Custom acoustic model trait
trait CustomAcousticModel {
async fn synthesize(&self, input: &str) -> Result<Vec<f32>, Box<dyn std::error::Error>>;
}
// Integration function
async fn integrate_custom_model<M: CustomAcousticModel>(
model: &M,
text: &str,
position: Position3D
) -> Result<Vec<f32>, Box<dyn std::error::Error>> {
// Generate audio from custom model
let mono_audio = model.synthesize(text).await?;
// Spatialize using VoiRS Spatial
let hrtf_db = HrtfDatabase::load_default().await?;
let mut renderer = BinauralRenderer::new(
BinauralConfig::default(),
Arc::new(hrtf_db)
)?;
let source_id = renderer.add_source(position, SourceType::Static)?;
renderer.set_source_audio(source_id, &mono_audio)?;
let output = renderer.process_frame()?;
Ok(output.interleaved())
}
```
### Real-time Streaming Integration
```rust
use tokio::sync::mpsc;
use voirs_spatial::BinauralRenderer;
async fn streaming_spatial_tts() -> Result<(), Box<dyn std::error::Error>> {
let (tx, mut rx) = mpsc::channel::<Vec<f32>>(32);
// Spawn TTS generation task
tokio::spawn(async move {
let mut tts_engine = VitsModel::load_default().await.unwrap();
while let Some(text) = get_next_text_chunk() {
let audio = tts_engine.synthesize_streaming(&text).await.unwrap();
tx.send(audio).await.unwrap();
}
});
// Spatial processing task
let mut spatial = setup_spatial_renderer().await?;
while let Some(audio_chunk) = rx.recv().await {
spatial.add_streaming_audio(0, &audio_chunk)?;
let output = spatial.process_frame()?;
// Send to audio output
output_to_speakers(&output)?;
}
Ok(())
}
```
---
## Performance Considerations
### Memory Sharing
When integrating multiple VoiRS crates, share resources to reduce memory usage:
```rust
use std::sync::Arc;
use voirs_spatial::HrtfDatabase;
use voirs_acoustic::AcousticModel;
struct SharedResources {
hrtf_db: Arc<HrtfDatabase>,
acoustic_model: Arc<AcousticModel>,
}
impl SharedResources {
async fn new() -> Result<Self, Box<dyn std::error::Error>> {
Ok(Self {
hrtf_db: Arc::new(HrtfDatabase::load_default().await?),
acoustic_model: Arc::new(AcousticModel::load_default().await?),
})
}
fn create_spatial_renderer(&self) -> Result<BinauralRenderer, Box<dyn std::error::Error>> {
BinauralRenderer::new(
BinauralConfig::default(),
Arc::clone(&self.hrtf_db)
)
}
}
```
### Thread Pool Management
```rust
use tokio::runtime::Runtime;
fn setup_optimized_runtime() -> Runtime {
tokio::runtime::Builder::new_multi_thread()
.worker_threads(4) // Dedicated threads for audio
.thread_name("voirs-audio")
.enable_all()
.build()
.unwrap()
}
```
---
## Summary
VoiRS Spatial integrates seamlessly with:
- ✅ **voirs-acoustic**: Spatialize synthesized speech
- ✅ **voirs-emotion**: Emotional spatial characteristics
- ✅ **voirs-cloning**: Multi-user cloned voices
- ✅ **voirs-recognizer**: Spatial feedback systems
- ✅ **Unity/Unreal**: Game engine integration
- ✅ **WebAssembly**: Browser-based applications
**Key Integration Principles:**
1. Use `Arc<T>` for shared resources
2. Process pipeline: G2P → Acoustic → Vocoder → Spatial
3. Maintain consistent sample rates across crates
4. Monitor performance at integration boundaries
5. Leverage async/await for I/O operations
---
**Version:** 0.1.0-alpha.2
**Last Updated:** 2025-12-09