Crate voxudio

Expand description

§Voxudio

voxudio is a real-time audio processing library with ONNX runtime support. It provides a set of tools for audio device management, signal processing, and machine learning model integration for audio applications.

§Features

Audio device enumeration and management
Real-time audio processing capabilities
ONNX model integration for audio machine learning tasks
OPUS audio codec support (encoding/decoding)
Online feature extraction (FBank, MFCC, Whisper FBank) based on kaldi-native-fbank
- Builder pattern with with_* methods for flexible parameter configuration (e.g., number of mel bins, window type, etc.)
Automatic Speech Recognition (ASR) API
- Provides AutomaticSpeechRecognizer for direct feature-to-text recognition
- All public APIs are documented with usage examples
Cross-platform support

§Example

§Speaker embedding extraction example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Initialize voice activity detector and speaker embedding extractor
    let mut vad = VoiceActivityDetector::new("checkpoint/voice_activity_detector.onnx")?;
    let mut see = SpeakerEmbeddingExtractor::new("checkpoint/speaker_embedding_extractor.onnx")?;

    // Load audio file
    let (audio, channels) = load_audio::<22050, f32, _>("../asset/test.wav", false).await?;

    // Detect speech segments
    let vad_audio = vad.retain_speech_only::<22050>(&audio, channels).await?;

    // Extract speaker embedding
    let embedding = see.extract(&vad_audio, channels).await?;
    println!("Extracted embedding: {:?}", embedding);

    Ok(())
}

§Online feature extraction example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    // Build an online FBank feature extractor (algorithm from kaldi-native-fbank)
    let extractor = OnlineFbankFeatureExtractor::fbank()?
        .with_energy_floor(1.0)
        .build()?;
    // Load audio file
    let (audio, channels) = load_audio::<16000, f32, _>("../asset/test.wav", true).await?;
    // Extract FBank features
    let features = extractor.extract::<16000>(&audio);
    println!("FBank features: {:?}", features);

    Ok(())
}

§Automatic Speech Recognition example

use voxudio::*;
use anyhow::Result;

#[tokio::main]
async fn main() -> Result<()> {
    let mut asr = AutomaticSpeechRecognizer::new_legacy("checkpoint/automatic_speech_recognizer.onnx")?;
    let features = vec![0.0; AutomaticSpeechRecognizer::NUM_BINS as usize * 10]; // Assume features are extracted
    let text = asr.recognize(&features).await?;
    println!("{}", text);
    Ok(())
}

See more from examples.

§License

This project is licensed under the Apache License, Version 2.0.

Enums§

OperationError: 操作过程中可能出现的错误类型

Traits§

GenericSample: 音频样本类型特征

Functions§

decode_audio: 解码音频数据
load_audio: 异步加载音频文件
resample: 对音频数据进行重采样处理
resample_dynamic: 动态采样率版本的重采样处理
reverb: 对音频数据添加房间混响效果
spatial_audio: 对音频数据进行空间化处理（3D音效）
speed: 对音频数据进行变速处理（变速变调）

Crate voxudio

Crate voxudio Copy item path

§Voxudio

§Features

§Example

§Speaker embedding extraction example

§Online feature extraction example

§Automatic Speech Recognition example

§License

Enums§

Traits§

Functions§

Crate voxudio