Expand description
§VoiRS Recognition
Voice recognition and analysis capabilities for the VoiRS ecosystem.
This crate provides automatic speech recognition (ASR), phoneme recognition,
and comprehensive audio analysis functionality.
§Features
- ASR Models: Whisper,
DeepSpeech,Wav2Vec2support - Phoneme Recognition: Forced alignment and automatic recognition
- Audio Analysis: Quality metrics, prosody, speaker characteristics
- Streaming Support: Real-time processing capabilities
- Multi-language: Support for multiple languages and accents
§Quick Start
§Basic Audio Analysis
use voirs_recognizer::prelude::*;
use voirs_recognizer::RecognitionError;
#[tokio::main]
async fn main() -> Result<(), RecognitionError> {
// Create audio buffer with some sample data
let samples = vec![0.0f32; 16000]; // 1 second of silence at 16kHz
let audio = AudioBuffer::mono(samples, 16000);
// Create audio analyzer for comprehensive analysis
let analyzer_config = AudioAnalysisConfig::default();
let analyzer = AudioAnalyzerImpl::new(analyzer_config).await?;
let analysis = analyzer.analyze(&audio, Some(&AudioAnalysisConfig::default())).await?;
// Access quality metrics
if let Some(snr) = analysis.quality_metrics.get("snr") {
println!("Audio analysis complete: SNR = {:.2}", snr);
}
Ok(())
}§ASR Accuracy Validation
use voirs_recognizer::prelude::*;
use voirs_recognizer::{RecognitionError, asr::BenchmarkingConfig};
#[tokio::main]
async fn main() -> Result<(), RecognitionError> {
// Create a benchmarking suite with default configuration
let benchmark_config = BenchmarkingConfig::default();
let benchmark_suite = ASRBenchmarkingSuite::new(benchmark_config).await?;
// Create an accuracy validator with standard requirements
let accuracy_validator = AccuracyValidator::new_standard();
// Validate accuracy against standard benchmarks
let validation_report = accuracy_validator.validate_accuracy(&benchmark_suite).await?;
// Generate and display validation report
let summary = accuracy_validator.generate_summary_report(&validation_report);
println!("Accuracy Validation Results:\n{}", summary);
// Check if all requirements passed
if validation_report.overall_passed {
println!("✅ All accuracy requirements passed!");
} else {
println!("❌ Some accuracy requirements failed.");
println!("Passed: {}/{}", validation_report.passed_requirements, validation_report.total_requirements);
}
Ok(())
}§Performance Tuning Guide
§Model Selection for Performance
Choose the appropriate model size based on your performance requirements:
use voirs_recognizer::prelude::*;
use voirs_recognizer::asr::whisper::WhisperConfig;
// For real-time applications with tight latency constraints
let fast_config = WhisperConfig {
model_size: "tiny".to_string(),
..Default::default()
};
// For balanced performance and accuracy
let balanced_config = WhisperConfig {
model_size: "base".to_string(),
..Default::default()
};
// For highest accuracy (higher latency)
let accurate_config = WhisperConfig {
model_size: "small".to_string(),
..Default::default()
};§Memory Optimization
- Model Quantization: Use INT8 or FP16 quantization to reduce memory usage
- Batch Processing: Process multiple audio files together for better throughput
- Memory Pools: Enable GPU memory pooling for efficient tensor reuse
§Real-time Processing Optimization
use voirs_recognizer::prelude::*;
use voirs_recognizer::integration::config::{StreamingConfig, LatencyMode};
// Configure for ultra-low latency
let streaming_config = StreamingConfig {
latency_mode: LatencyMode::UltraLow,
chunk_size: 1600, // Smaller chunks for lower latency (100ms at 16kHz)
overlap: 400, // Minimal overlap (25ms at 16kHz)
buffer_duration: 3.0, // Limited buffer for speed
};§Performance Monitoring
Monitor your application’s performance to ensure it meets requirements:
use voirs_recognizer::prelude::*;
use std::time::Duration;
let validator = PerformanceValidator::new()
.with_verbose(true);
let requirements = PerformanceRequirements {
max_rtf: 0.3, // Real-time factor < 0.3
max_memory_usage: 2_000_000_000, // < 2GB
max_startup_time_ms: 5000, // < 5 seconds
max_streaming_latency_ms: 200, // < 200ms
};
// Validate streaming latency
let latency = Duration::from_millis(150);
let (latency_ms, passed) = validator.validate_streaming_latency(latency);
if !passed {
println!("Streaming latency {} ms exceeds requirement {} ms",
latency_ms, requirements.max_streaming_latency_ms);
}§Platform-Specific Optimizations
§GPU Acceleration
- Enable CUDA support for NVIDIA GPUs
- Use Metal acceleration on Apple Silicon
- Configure appropriate batch sizes for your GPU memory
§SIMD Optimizations
VoiRSautomatically detects and uses SIMD instructions (AVX2, NEON)- Ensure your CPU supports these instruction sets for optimal performance
- No manual configuration required - optimizations are applied automatically
§Multi-threading
- Use
num_cpus::get()to optimize thread pool sizes - Enable parallel processing for batch operations
- Balance thread count with memory usage
Re-exports§
pub use traits::ASRConfig;pub use traits::ASRFeature;pub use traits::ASRMetadata;pub use traits::ASRModel;pub use traits::AudioAnalysis;pub use traits::AudioAnalysisConfig;pub use traits::AudioAnalyzer;pub use traits::AudioAnalyzerMetadata;pub use traits::AudioStream;pub use traits::PhonemeAlignment;pub use traits::PhonemeRecognitionConfig;pub use traits::PhonemeRecognizer;pub use traits::PhonemeRecognizerMetadata;pub use traits::RecognitionResult;pub use traits::Transcript;pub use traits::TranscriptChunk;pub use traits::TranscriptStream;pub use analysis::AudioAnalyzerImpl;pub use asr::ASRBackend;pub use asr::ASRBenchmarkingSuite;pub use asr::AccuracyValidator;pub use asr::IntelligentASRFallback;pub use asr::advanced_optimization::AdvancedOptimizationConfig;pub use asr::advanced_optimization::KnowledgeDistillationOptimizer;pub use asr::advanced_optimization::MixedPrecisionOptimizer;pub use asr::advanced_optimization::OptimizationObjective;pub use asr::advanced_optimization::OptimizationPlatform;pub use asr::advanced_optimization::ProgressivePruningOptimizer;pub use asr::optimization_integration::ModelStats;pub use asr::optimization_integration::OptimizationPipeline;pub use asr::optimization_integration::OptimizationResults;pub use asr::optimization_integration::OptimizationSummary;pub use audio_formats::load_audio;pub use audio_formats::load_audio_with_sample_rate;pub use audio_formats::AudioFormat;pub use audio_formats::AudioLoadConfig;pub use audio_formats::UniversalAudioLoader;pub use audio_utilities::analyze_audio_quality;pub use audio_utilities::extract_speech_segments;pub use audio_utilities::load_and_preprocess;pub use audio_utilities::optimize_for_recognition;pub use audio_utilities::split_audio_smart;pub use audio_utilities::AudioQualityReport;pub use audio_utilities::AudioUtilities;pub use performance::PerformanceMetrics;pub use performance::PerformanceRequirements;pub use performance::PerformanceValidator;pub use performance::ValidationResult;pub use preprocessing::AudioPreprocessingConfig;pub use preprocessing::AudioPreprocessor;pub use wake_word::EnergyOptimizer;pub use wake_word::NeuralWakeWordModel;pub use wake_word::TemplateWakeWordModel;pub use wake_word::TrainingPhase;pub use wake_word::TrainingProgress;pub use wake_word::TrainingValidationReport;pub use wake_word::WakeWordConfig;pub use wake_word::WakeWordDetection;pub use wake_word::WakeWordDetector;pub use wake_word::WakeWordDetectorImpl;pub use wake_word::WakeWordModel;pub use wake_word::WakeWordStats;pub use wake_word::WakeWordTrainer;pub use wake_word::WakeWordTrainerImpl;pub use wake_word::WakeWordTrainingData;
Modules§
- analysis
- Audio analysis implementations
- asr
- Automatic Speech Recognition (ASR) implementations
- audio_
formats - Audio Format Support
- audio_
utilities - Audio utilities for common processing tasks
- caching
- Advanced caching strategies for model inference
- cloud_
storage - Cloud Storage Integration
- config
- Shared configuration management for
VoiRSrecognizer. - disaster_
recovery - Disaster recovery and business continuity planning
- error_
bridge - Error Bridge
- error_
enhancement - Enhanced error messages and solutions for
VoiRSRecognizer - error_
recovery - Enhanced error recovery mechanisms for
VoiRSRecognizer - high_
availability - High availability architecture for production deployments
- integration
VoiRSEcosystem Integration- logging
- Unified logging infrastructure for
VoiRSrecognizer. - memory_
optimization - Memory optimization utilities for
VoiRSRecognizer - mobile
- Mobile platform optimizations for VoiRS recognizer
- monitoring
- Monitoring and observability infrastructure for
VoiRSRecognition - multimodal
- Multi-modal processing for
VoiRSrecognizer. - performance
- Performance validation and monitoring utilities.
- phoneme
- Phoneme recognition and forced alignment implementations
- prelude
- Convenient prelude for common imports
- preprocessing
- Audio preprocessing and enhancement module
- privacy
- Privacy-preserving techniques for speech recognition.
- sdk_
bridge VoiRSSDK Bridge- security_
audit - Security audit and compliance framework
- sla_
guarantees - Performance SLA Guarantees
- training
- Comprehensive Model Training and Fine-tuning Framework
- traits
- Core traits for the
VoiRSrecognition system - wake_
word - Wake word detection and keyword spotting functionality
Macros§
- log_
info - Convenience macro for logging with context
- profile_
block - Macro rules
- profile_
function - Profiling macros for easy instrumentation
Structs§
- Audio
Buffer - Item Audio buffer containing synthesized speech
- Phoneme
- Item Phoneme representation with IPA symbol and metadata
Enums§
- Language
Code - Item Language code identifier
- Recognition
Error - Recognition-specific error types Recognition Error
- Voirs
Error - Item Main error type for VoiRS operations with enhanced categorization
Constants§
- VERSION
- Version information
Functions§
- confidence_
to_ label - Convert confidence score to human-readable label confidence to label
- default_
analysis_ config - Create a default audio analysis configuration default analysis config
- default_
asr_ config - Create a default ASR configuration for a given language default asr config
- default_
phoneme_ config - Create a default phoneme recognition configuration for a given language default phoneme config
- load_
audio_ simple - Simple audio loading function for quick start examples
- merge_
transcripts - Utility function to merge transcripts merge transcripts
- validate_
model_ file - Check if a model file exists and is valid