Skip to main content

Crate voxudio

Crate voxudio 

Source
Expand description

§Voxudio

voxudio is a real-time audio processing library with ONNX runtime support. It provides a set of tools for audio device management, signal processing, and machine learning model integration for audio applications.

§Features

  • Audio device enumeration and management
  • Real-time audio processing capabilities
  • ONNX model integration for audio machine learning tasks
  • OPUS audio codec support (encoding/decoding)
  • Online feature extraction (FBank, MFCC, Whisper FBank) based on kaldi-native-fbank
    • Builder pattern with with_* methods for flexible parameter configuration (e.g., number of mel bins, window type, etc.)
  • Automatic Speech Recognition (ASR) API
    • Provides AutomaticSpeechRecognizer for direct feature-to-text recognition
    • All public APIs are documented with usage examples
  • Cross-platform support

§Example

§Speaker embedding extraction example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize voice activity detector and speaker embedding extractor
    let mut vad = VoiceActivityDetector::new("checkpoint/voice_activity_detector.onnx")?;
    let mut see = SpeakerEmbeddingExtractor::new("checkpoint/speaker_embedding_extractor.onnx")?;

    // Load audio file
    let (audio, channels) = load_audio::<22050, f32, _>("../asset/test.wav", false).await?;

    // Detect speech segments
    let vad_audio = vad.retain_speech_only::<22050>(&audio, channels).await?;

    // Extract speaker embedding
    let embedding = see.extract(&vad_audio, channels).await?;
    println!("Extracted embedding: {:?}", embedding);

    Ok(())
}

§Online feature extraction example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Build an online FBank feature extractor (algorithm from kaldi-native-fbank)
    let extractor = OnlineFbankFeatureExtractor::fbank()?
        .with_energy_floor(1.0)
        .build()?;
    // Load audio file
    let (audio, channels) = load_audio::<16000, f32, _>("../asset/test.wav", true).await?;
    // Extract FBank features
    let features = extractor.extract::<16000>(&audio);
    println!("FBank features: {:?}", features);

    Ok(())
}

§Automatic Speech Recognition example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    let mut asr = AutomaticSpeechRecognizer::new_legacy("checkpoint/automatic_speech_recognizer.onnx")?;
    let features = vec![0.0; AutomaticSpeechRecognizer::NUM_BINS as usize * 10]; // Assume features are extracted
    let text = asr.recognize(&features).await?;
    println!("{}", text);
    Ok(())
}

See more from examples.

§License

This project is licensed under the Apache License, Version 2.0.

Enums§

OperationError
操作过程中可能出现的错误类型

Traits§

GenericSample
音频样本类型特征

Functions§

decode_audio
解码音频数据
load_audio
异步加载音频文件
resample
对音频数据进行重采样处理
resample_dynamic
动态采样率版本的重采样处理
reverb
对音频数据添加房间混响效果
spatial_audio
对音频数据进行空间化处理(3D音效)
speed
对音频数据进行变速处理(变速变调)