Skip to main content

Crate polyvoice

Crate polyvoice 

Source
Expand description

§polyvoice

Speaker diarization library for Rust — online (streaming) and offline (file-based), ONNX-powered, and ecosystem-agnostic.

Designed to be embedded into any Rust application that needs to answer the question “who spoke when?”.

§Quick start

use polyvoice::{OfflineDiarizer, DiarizationConfig, DummyExtractor};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = DiarizationConfig::default();
    let diarizer = OfflineDiarizer::new(config);
    let extractor = DummyExtractor::new(256);

    // Load 16 kHz mono f32 samples (e.g. from an audio file decoder).
    let samples: Vec<f32> = vec![0.0; 16000 * 10];
    let result = diarizer.run(&samples, &extractor)?;

    for turn in &result.turns {
        println!("{}: {:.2}s - {:.2}s", turn.speaker, turn.time.start, turn.time.end);
    }
    Ok(())
}

Re-exports§

pub use features::FbankConfig;
pub use features::FbankExtractor;
pub use cluster::SpeakerCluster;
pub use embedding::DummyExtractor;
pub use embedding::EmbeddingError;
pub use embedding::EmbeddingExtractor;
pub use offline::OfflineDiarizer;
pub use online::OnlineDiarizer;
pub use overlap::OverlapRegion;
pub use overlap::detect_overlaps;
pub use pipeline::Pipeline;
pub use pipeline::PipelineError;
pub use silero_vad::SileroVad;
pub use types::Confidence;
pub use types::DiarizationConfig;
pub use types::DiarizationResult;
pub use types::EmbeddingDim;
pub use types::SampleRate;
pub use types::Seconds;
pub use types::Segment;
pub use types::SpeakerId;
pub use types::SpeakerIdRemap;
pub use types::SpeakerTurn;
pub use types::TimeRange;
pub use types::WordAlignment;
pub use types::remap_segments;
pub use types::remap_turns;
pub use vad::EnergyVad;
pub use vad::VadConfig;
pub use vad::VadError;
pub use vad::VoiceActivityDetector;
pub use vad::segment_speech;

Modules§

ahc
Agglomerative Hierarchical Clustering (AHC) for speaker diarization.
cluster
Speaker clustering with online incremental centroid updates.
der
Diarization Error Rate (DER) computation.
embedding
Speaker embedding extraction trait.
features
Log-mel filterbank (fbank) feature extraction for speaker embeddings.
offline
Offline (file-based) speaker diarization.
online
Online (streaming) speaker diarization.
overlap
Overlap detection: identify frames where multiple speakers may be active.
pipeline
High-level diarization pipeline.
rttm
RTTM (Rich Transcription Time Marked) parser and writer.
silero_vad
Silero VAD v5 ONNX integration.
types
Core types for speaker diarization.
utils
Math utilities for diarization.
vad
Voice Activity Detection trait and utilities.
wav
WAV file I/O via the hound crate.