voirs-conversion
Real-time Voice Conversion and Audio Transformation System
This crate provides real-time voice conversion capabilities including speaker conversion, age/gender transformation, voice morphing, and streaming voice conversion for live applications.
๐ญ Features
Core Voice Conversion
- Real-time Conversion - Low-latency voice conversion for live applications (<50ms)
- Speaker-to-Speaker - Convert between different speaker identities
- Style Transfer - Transfer speaking style and characteristics
- Quality Preservation - Maintain audio quality during conversion
Transformation Types
- Age Transformation - Make voices sound younger or older
- Gender Conversion - Convert between male and female voices
- Pitch Modification - Precise pitch scaling and shifting
- Speed Adjustment - Modify speaking rate while preserving quality
- Voice Morphing - Blend characteristics from multiple sources
Streaming Support
- Real-time Processing - Process audio streams with minimal latency
- Chunk-based Processing - Efficient processing of audio chunks
- Buffer Management - Intelligent audio buffer handling
- Adaptive Quality - Adjust quality based on processing constraints
Advanced Features
- Cross-domain Conversion - Convert between different audio domains
- Prosody Preservation - Maintain natural prosody patterns
- Emotional Consistency - Preserve emotional expression
- Multi-target Conversion - Convert to multiple targets simultaneously
๐ Quick Start
Basic Voice Conversion
use *;
async
Real-time Streaming Conversion
use *;
use ;
async
Age and Gender Transformation
use *;
// Create age transformation
let age_transform = builder
.target_age // Target age
.current_age // Estimated current age
.naturalness // Preserve naturalness
.build?;
// Create gender transformation
let gender_transform = builder
.target_gender
.source_gender
.pitch_shift_method
.formant_adjustment
.build?;
// Apply transformations
let audio = load_audio.await?;
let aged_audio = age_transform.apply.await?;
let gender_converted = gender_transform.apply.await?;
Voice Morphing
use *;
// Create voice morpher
let morpher = new;
// Define morph targets with weights
let morph_targets = vec!;
// Morph voice characteristics
let target_characteristics = morpher
.morph_characteristics
.await?;
// Apply morphing to audio
let morphed_audio = morpher
.apply_morphing
.await?;
๐ง Configuration
Conversion Types
use *;
// Speaker-to-speaker conversion
let speaker_config = builder
.conversion_type
.preserve_prosody
.preserve_emotion
.quality_mode
.build?;
// Age transformation
let age_config = builder
.conversion_type
.build?;
// Real-time conversion
let realtime_config = builder
.conversion_type
.max_latency_ms
.chunk_size
.build?;
Audio Processing Pipeline
use *;
// Create processing pipeline
let pipeline = builder
.add_stage
.add_stage
.add_stage
.add_stage
.with_parallel_processing
.build?;
// Configure audio buffer
let buffer_config = AudioBufferConfig ;
๐ช Advanced Features
Cross-domain Conversion
use *;
// Convert between different audio domains
let converter = builder
.source_domain // 8kHz, compressed
.target_domain // 48kHz, high quality
.with_super_resolution
.with_noise_suppression
.build.await?;
let enhanced_audio = converter
.convert_domain
.await?;
Batch Conversion
use *;
// Process multiple files in batch
let batch_processor = builder
.with_parallel_jobs
.with_progress_reporting
.build?;
let batch_request = BatchConversionRequest ;
let results = batch_processor
.process_batch
.await?;
for result in results
Quality Assessment
use *;
// Assess conversion quality
let assessor = new.await?;
let quality_metrics = assessor.assess.await?;
println!;
println!;
println!;
println!;
๐ Performance
Real-time Performance
| Configuration | Latency | RTF | CPU Usage | Memory |
|---|---|---|---|---|
| Low Quality | 25ms | 0.15ร | 15% | 200MB |
| Balanced | 35ms | 0.25ร | 25% | 400MB |
| High Quality | 50ms | 0.40ร | 35% | 600MB |
| Ultra Quality | 100ms | 0.60ร | 50% | 800MB |
Batch Processing Performance
use *;
// Performance monitoring
let monitor = new;
// Optimize for your use case
let config = builder
.optimization_target // or Quality, Throughput
.hardware_acceleration
.memory_limit // 2GB
.build?;
// Monitor performance
monitor.start_monitoring;
let result = converter.convert.await?;
let metrics = monitor.get_metrics;
println!;
println!;
println!;
๐ก๏ธ Quality Control
Artifact Detection
use *;
// Detect conversion artifacts
let artifact_detector = new;
let artifacts = artifact_detector.detect.await?;
for artifact in artifacts
Automatic Quality Adjustment
use *;
// Adaptive quality based on input characteristics
let adaptive_converter = builder
.with_quality_threshold
.with_automatic_adjustment
.build.await?;
// Converter automatically adjusts parameters
let result = adaptive_converter.convert.await?;
println!;
println!;
๐งช Testing
# Run voice conversion tests
# Run real-time processing tests
# Run transformation tests
# Run quality assessment tests
# Run performance benchmarks
๐ Integration
With Cloning Module
use *;
// Integration with voice cloning
let cloning_adapter = new;
let cloned_voice = cloning_adapter
.adapt_cloned_voice
.await?;
let converter = builder
.with_cloned_voice_target
.build.await?;
With Acoustic Models
use *;
// Direct acoustic model integration
let acoustic_converter = new;
let converted_features = acoustic_converter
.convert_acoustic_features
.await?;
With Other VoiRS Crates
- voirs-cloning - Voice cloning for conversion targets
- voirs-emotion - Emotion preservation during conversion
- voirs-acoustic - Direct acoustic feature conversion
- voirs-evaluation - Conversion quality metrics
- voirs-sdk - High-level conversion API
๐ Examples
See the examples/ directory for comprehensive usage examples:
voice_conversion_example.rs- Basic conversionrealtime_conversion.rs- Streaming conversionbatch_conversion.rs- Batch processingage_gender_transform.rs- Transformation effects
๐ License
Licensed under the Apache License, Version 2.0.
Part of the VoiRS neural speech synthesis ecosystem.