Skip to main content

Crate voirs_recognizer

Crate voirs_recognizer 

Source
Expand description

§VoiRS Recognition

Voice recognition and analysis capabilities for the VoiRS ecosystem. This crate provides automatic speech recognition (ASR), phoneme recognition, and comprehensive audio analysis functionality.

§Features

  • ASR Models: Whisper, DeepSpeech, Wav2Vec2 support
  • Phoneme Recognition: Forced alignment and automatic recognition
  • Audio Analysis: Quality metrics, prosody, speaker characteristics
  • Streaming Support: Real-time processing capabilities
  • Multi-language: Support for multiple languages and accents

§Quick Start

§Basic Audio Analysis

use voirs_recognizer::prelude::*;
use voirs_recognizer::RecognitionError;

#[tokio::main]
async fn main() -> Result<(), RecognitionError> {
    // Create audio buffer with some sample data
    let samples = vec![0.0f32; 16000]; // 1 second of silence at 16kHz
    let audio = AudioBuffer::mono(samples, 16000);
     
    // Create audio analyzer for comprehensive analysis
    let analyzer_config = AudioAnalysisConfig::default();
    let analyzer = AudioAnalyzerImpl::new(analyzer_config).await?;
    let analysis = analyzer.analyze(&audio, Some(&AudioAnalysisConfig::default())).await?;
     
    // Access quality metrics
    if let Some(snr) = analysis.quality_metrics.get("snr") {
        println!("Audio analysis complete: SNR = {:.2}", snr);
    }
     
    Ok(())
}

§ASR Accuracy Validation

use voirs_recognizer::prelude::*;
use voirs_recognizer::{RecognitionError, asr::BenchmarkingConfig};

#[tokio::main]
async fn main() -> Result<(), RecognitionError> {
    // Create a benchmarking suite with default configuration
    let benchmark_config = BenchmarkingConfig::default();
    let benchmark_suite = ASRBenchmarkingSuite::new(benchmark_config).await?;
     
    // Create an accuracy validator with standard requirements
    let accuracy_validator = AccuracyValidator::new_standard();
     
    // Validate accuracy against standard benchmarks
    let validation_report = accuracy_validator.validate_accuracy(&benchmark_suite).await?;
     
    // Generate and display validation report
    let summary = accuracy_validator.generate_summary_report(&validation_report);
    println!("Accuracy Validation Results:\n{}", summary);
     
    // Check if all requirements passed
    if validation_report.overall_passed {
        println!("✅ All accuracy requirements passed!");
    } else {
        println!("❌ Some accuracy requirements failed.");
        println!("Passed: {}/{}", validation_report.passed_requirements, validation_report.total_requirements);
    }
     
    Ok(())
}

§Performance Tuning Guide

§Model Selection for Performance

Choose the appropriate model size based on your performance requirements:

use voirs_recognizer::prelude::*;
use voirs_recognizer::asr::whisper::WhisperConfig;

// For real-time applications with tight latency constraints
let fast_config = WhisperConfig {
    model_size: "tiny".to_string(),
    ..Default::default()
};

// For balanced performance and accuracy  
let balanced_config = WhisperConfig {
    model_size: "base".to_string(),
    ..Default::default()
};

// For highest accuracy (higher latency)
let accurate_config = WhisperConfig {
    model_size: "small".to_string(),
    ..Default::default()
};

§Memory Optimization

  • Model Quantization: Use INT8 or FP16 quantization to reduce memory usage
  • Batch Processing: Process multiple audio files together for better throughput
  • Memory Pools: Enable GPU memory pooling for efficient tensor reuse

§Real-time Processing Optimization

use voirs_recognizer::prelude::*;
use voirs_recognizer::integration::config::{StreamingConfig, LatencyMode};

// Configure for ultra-low latency
let streaming_config = StreamingConfig {
    latency_mode: LatencyMode::UltraLow,
    chunk_size: 1600,          // Smaller chunks for lower latency (100ms at 16kHz)
    overlap: 400,              // Minimal overlap (25ms at 16kHz)  
    buffer_duration: 3.0,      // Limited buffer for speed
};

§Performance Monitoring

Monitor your application’s performance to ensure it meets requirements:

use voirs_recognizer::prelude::*;
use std::time::Duration;

let validator = PerformanceValidator::new()
    .with_verbose(true);

let requirements = PerformanceRequirements {
    max_rtf: 0.3,              // Real-time factor < 0.3
    max_memory_usage: 2_000_000_000, // < 2GB
    max_startup_time_ms: 5000, // < 5 seconds
    max_streaming_latency_ms: 200, // < 200ms
};

// Validate streaming latency
let latency = Duration::from_millis(150);
let (latency_ms, passed) = validator.validate_streaming_latency(latency);
if !passed {
    println!("Streaming latency {} ms exceeds requirement {} ms",
             latency_ms, requirements.max_streaming_latency_ms);
}

§Platform-Specific Optimizations

§GPU Acceleration
  • Enable CUDA support for NVIDIA GPUs
  • Use Metal acceleration on Apple Silicon
  • Configure appropriate batch sizes for your GPU memory
§SIMD Optimizations
  • VoiRS automatically detects and uses SIMD instructions (AVX2, NEON)
  • Ensure your CPU supports these instruction sets for optimal performance
  • No manual configuration required - optimizations are applied automatically
§Multi-threading
  • Use num_cpus::get() to optimize thread pool sizes
  • Enable parallel processing for batch operations
  • Balance thread count with memory usage

Re-exports§

pub use traits::ASRConfig;
pub use traits::ASRFeature;
pub use traits::ASRMetadata;
pub use traits::ASRModel;
pub use traits::AudioAnalysis;
pub use traits::AudioAnalysisConfig;
pub use traits::AudioAnalyzer;
pub use traits::AudioAnalyzerMetadata;
pub use traits::AudioStream;
pub use traits::PhonemeAlignment;
pub use traits::PhonemeRecognitionConfig;
pub use traits::PhonemeRecognizer;
pub use traits::PhonemeRecognizerMetadata;
pub use traits::RecognitionResult;
pub use traits::Transcript;
pub use traits::TranscriptChunk;
pub use traits::TranscriptStream;
pub use analysis::AudioAnalyzerImpl;
pub use asr::ASRBackend;
pub use asr::ASRBenchmarkingSuite;
pub use asr::AccuracyValidator;
pub use asr::IntelligentASRFallback;
pub use asr::advanced_optimization::AdvancedOptimizationConfig;
pub use asr::advanced_optimization::KnowledgeDistillationOptimizer;
pub use asr::advanced_optimization::MixedPrecisionOptimizer;
pub use asr::advanced_optimization::OptimizationObjective;
pub use asr::advanced_optimization::OptimizationPlatform;
pub use asr::advanced_optimization::ProgressivePruningOptimizer;
pub use asr::optimization_integration::ModelStats;
pub use asr::optimization_integration::OptimizationPipeline;
pub use asr::optimization_integration::OptimizationResults;
pub use asr::optimization_integration::OptimizationSummary;
pub use audio_formats::load_audio;
pub use audio_formats::load_audio_with_sample_rate;
pub use audio_formats::AudioFormat;
pub use audio_formats::AudioLoadConfig;
pub use audio_formats::UniversalAudioLoader;
pub use audio_utilities::analyze_audio_quality;
pub use audio_utilities::extract_speech_segments;
pub use audio_utilities::load_and_preprocess;
pub use audio_utilities::optimize_for_recognition;
pub use audio_utilities::split_audio_smart;
pub use audio_utilities::AudioQualityReport;
pub use audio_utilities::AudioUtilities;
pub use performance::PerformanceMetrics;
pub use performance::PerformanceRequirements;
pub use performance::PerformanceValidator;
pub use performance::ValidationResult;
pub use preprocessing::AudioPreprocessingConfig;
pub use preprocessing::AudioPreprocessor;
pub use wake_word::EnergyOptimizer;
pub use wake_word::NeuralWakeWordModel;
pub use wake_word::TemplateWakeWordModel;
pub use wake_word::TrainingPhase;
pub use wake_word::TrainingProgress;
pub use wake_word::TrainingValidationReport;
pub use wake_word::WakeWordConfig;
pub use wake_word::WakeWordDetection;
pub use wake_word::WakeWordDetector;
pub use wake_word::WakeWordDetectorImpl;
pub use wake_word::WakeWordModel;
pub use wake_word::WakeWordStats;
pub use wake_word::WakeWordTrainer;
pub use wake_word::WakeWordTrainerImpl;
pub use wake_word::WakeWordTrainingData;

Modules§

analysis
Audio analysis implementations
asr
Automatic Speech Recognition (ASR) implementations
audio_formats
Audio Format Support
audio_utilities
Audio utilities for common processing tasks
caching
Advanced caching strategies for model inference
cloud_storage
Cloud Storage Integration
config
Shared configuration management for VoiRS recognizer.
disaster_recovery
Disaster recovery and business continuity planning
error_bridge
Error Bridge
error_enhancement
Enhanced error messages and solutions for VoiRS Recognizer
error_recovery
Enhanced error recovery mechanisms for VoiRS Recognizer
high_availability
High availability architecture for production deployments
integration
VoiRS Ecosystem Integration
logging
Unified logging infrastructure for VoiRS recognizer.
memory_optimization
Memory optimization utilities for VoiRS Recognizer
mobile
Mobile platform optimizations for VoiRS recognizer
monitoring
Monitoring and observability infrastructure for VoiRS Recognition
multimodal
Multi-modal processing for VoiRS recognizer.
performance
Performance validation and monitoring utilities.
phoneme
Phoneme recognition and forced alignment implementations
prelude
Convenient prelude for common imports
preprocessing
Audio preprocessing and enhancement module
privacy
Privacy-preserving techniques for speech recognition.
sdk_bridge
VoiRS SDK Bridge
security_audit
Security audit and compliance framework
sla_guarantees
Performance SLA Guarantees
training
Comprehensive Model Training and Fine-tuning Framework
traits
Core traits for the VoiRS recognition system
wake_word
Wake word detection and keyword spotting functionality

Macros§

log_info
Convenience macro for logging with context
profile_block
Macro rules
profile_function
Profiling macros for easy instrumentation

Structs§

AudioBuffer
Item Audio buffer containing synthesized speech
Phoneme
Item Phoneme representation with IPA symbol and metadata

Enums§

LanguageCode
Item Language code identifier
RecognitionError
Recognition-specific error types Recognition Error
VoirsError
Item Main error type for VoiRS operations with enhanced categorization

Constants§

VERSION
Version information

Functions§

confidence_to_label
Convert confidence score to human-readable label confidence to label
default_analysis_config
Create a default audio analysis configuration default analysis config
default_asr_config
Create a default ASR configuration for a given language default asr config
default_phoneme_config
Create a default phoneme recognition configuration for a given language default phoneme config
load_audio_simple
Simple audio loading function for quick start examples
merge_transcripts
Utility function to merge transcripts merge transcripts
validate_model_file
Check if a model file exists and is valid