Expand description
§polyvoice
Speaker diarization library for Rust — online (streaming) and offline (file-based), ONNX-powered, and ecosystem-agnostic.
Designed to be embedded into any Rust application that needs to answer the question “who spoke when?”.
§Quick start
use polyvoice::{OfflineDiarizer, DiarizationConfig, DummyExtractor};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = DiarizationConfig::default();
let diarizer = OfflineDiarizer::new(config);
let extractor = DummyExtractor::new(256);
// Load 16 kHz mono f32 samples (e.g. from an audio file decoder).
let samples: Vec<f32> = vec![0.0; 16000 * 10];
let result = diarizer.run(&samples, &extractor)?;
for turn in &result.turns {
println!("{}: {:.2}s - {:.2}s", turn.speaker, turn.time.start, turn.time.end);
}
Ok(())
}Re-exports§
pub use features::FbankConfig;pub use features::FbankExtractor;pub use cluster::SpeakerCluster;pub use embedding::DummyExtractor;pub use embedding::EmbeddingError;pub use embedding::EmbeddingExtractor;pub use offline::OfflineDiarizer;pub use online::OnlineDiarizer;pub use overlap::OverlapRegion;pub use overlap::detect_overlaps;pub use pipeline::Pipeline;pub use pipeline::PipelineError;pub use silero_vad::SileroVad;pub use types::Confidence;pub use types::DiarizationConfig;pub use types::DiarizationResult;pub use types::EmbeddingDim;pub use types::SampleRate;pub use types::Seconds;pub use types::Segment;pub use types::SpeakerId;pub use types::SpeakerIdRemap;pub use types::SpeakerTurn;pub use types::TimeRange;pub use types::WordAlignment;pub use types::remap_segments;pub use types::remap_turns;pub use vad::EnergyVad;pub use vad::VadConfig;pub use vad::VadError;pub use vad::VoiceActivityDetector;pub use vad::segment_speech;
Modules§
- ahc
- Agglomerative Hierarchical Clustering (AHC) for speaker diarization.
- cluster
- Speaker clustering with online incremental centroid updates.
- der
- Diarization Error Rate (DER) computation.
- embedding
- Speaker embedding extraction trait.
- features
- Log-mel filterbank (fbank) feature extraction for speaker embeddings.
- offline
- Offline (file-based) speaker diarization.
- online
- Online (streaming) speaker diarization.
- overlap
- Overlap detection: identify frames where multiple speakers may be active.
- pipeline
- High-level diarization pipeline.
- rttm
- RTTM (Rich Transcription Time Marked) parser and writer.
- silero_
vad - Silero VAD v5 ONNX integration.
- types
- Core types for speaker diarization.
- utils
- Math utilities for diarization.
- vad
- Voice Activity Detection trait and utilities.
- wav
- WAV file I/O via the
houndcrate.