Expand description
§Voxudio
voxudio is a real-time audio processing library with ONNX runtime support.
It provides a set of tools for audio device management, signal processing,
and machine learning model integration for audio applications.
§Features
- Audio device enumeration and management
- Real-time audio processing capabilities
- ONNX model integration for audio machine learning tasks
- OPUS audio codec support (encoding/decoding)
- Online feature extraction (FBank, MFCC, Whisper FBank) based on kaldi-native-fbank
- Builder pattern with
with_*methods for flexible parameter configuration (e.g., number of mel bins, window type, etc.)
- Builder pattern with
- Automatic Speech Recognition (ASR) API
- Provides
AutomaticSpeechRecognizerfor direct feature-to-text recognition - All public APIs are documented with usage examples
- Provides
- Cross-platform support
§Example
§Speaker embedding extraction example
use voxudio::*;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize voice activity detector and speaker embedding extractor
let mut vad = VoiceActivityDetector::new("checkpoint/voice_activity_detector.onnx")?;
let mut see = SpeakerEmbeddingExtractor::new("checkpoint/speaker_embedding_extractor.onnx")?;
// Load audio file
let (audio, channels) = load_audio::<22050, f32, _>("../asset/test.wav", false).await?;
// Detect speech segments
let vad_audio = vad.retain_speech_only::<22050>(&audio, channels).await?;
// Extract speaker embedding
let embedding = see.extract(&vad_audio, channels).await?;
println!("Extracted embedding: {:?}", embedding);
Ok(())
}§Online feature extraction example
use voxudio::*;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
// Build an online FBank feature extractor (algorithm from kaldi-native-fbank)
let extractor = OnlineFbankFeatureExtractor::fbank()?
.with_energy_floor(1.0)
.build()?;
// Load audio file
let (audio, channels) = load_audio::<16000, f32, _>("../asset/test.wav", true).await?;
// Extract FBank features
let features = extractor.extract::<16000>(&audio);
println!("FBank features: {:?}", features);
Ok(())
}§Automatic Speech Recognition example
use voxudio::*;
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()> {
let mut asr = AutomaticSpeechRecognizer::new_legacy("checkpoint/automatic_speech_recognizer.onnx")?;
let features = vec![0.0; AutomaticSpeechRecognizer::NUM_BINS as usize * 10]; // Assume features are extracted
let text = asr.recognize(&features).await?;
println!("{}", text);
Ok(())
}See more from examples.
§License
This project is licensed under the Apache License, Version 2.0.
Enums§
- Operation
Error - 操作过程中可能出现的错误类型
Traits§
- Generic
Sample - 音频样本类型特征
Functions§
- decode_
audio - 解码音频数据
- load_
audio - 异步加载音频文件
- resample
- 对音频数据进行重采样处理
- resample_
dynamic - 动态采样率版本的重采样处理
- reverb
- 对音频数据添加房间混响效果
- spatial_
audio - 对音频数据进行空间化处理(3D音效)
- speed
- 对音频数据进行变速处理(变速变调)