polyvoice

Speaker diarization for Rust — who spoke when, without Python. Silero VAD + WeSpeaker embeddings + AHC clustering in a single call.

Quick Start

[dependencies]
polyvoice = { version = "0.6", features = ["onnx"] }

cargo add polyvoice --features onnx

Features

One-call pipeline — Pipeline::run() wires VAD → embeddings → AHC clustering.
Online & offline — OnlineDiarizer for streaming, OfflineDiarizer for batch.
CPU-only, ~30 MB — ONNX Runtime, no GPU or Python runtime required.
Multi-language — Rust library, Python bindings (pip install polyvoice), C FFI, CLI.
Lock-free concurrency — crossbeam-queue session pool for parallel inference.
Hardened — Miri (memory), Loom (concurrency), cargo-fuzz (4 targets), model signing (Minisign).

Minimal Example

use polyvoice::{Pipeline, DiarizationConfig, VadConfig, FbankOnnxExtractor, SileroVad};
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let ext = FbankOnnxExtractor::new(Path::new("models/wespeaker_resnet34.onnx"), 256, 4)?;
    let mut vad = SileroVad::new(Path::new("models/silero_vad.onnx"), 512)?;
    let (samples, _sr) = polyvoice::wav::read_wav(Path::new("meeting.wav"))?;
    let result = Pipeline::new(DiarizationConfig::default(), VadConfig::default())
        .run(&samples, &ext, &mut vad)?;
    for turn in &result.turns {
        println!("{}: {:.2}s - {:.2}s", turn.speaker, turn.time.start, turn.time.end);
    }
    Ok(())
}

Python / C FFI

import polyvoice
pipeline = polyvoice.Pipeline.balanced("models/")
result = pipeline.run(samples, sample_rate=16000)
for turn in result["turns"]:
    print(f"{turn['speaker']}: {turn['start']:.1f}s - {turn['end']:.1f}s")

// cargo build --features ffi
// See include/polyvoice.h and examples/ffi_usage.c
polyvoice_pipeline_create(BALANCED, "models/", &handle);
polyvoice_pipeline_run(handle, samples, n, 16000, &json, &len);

Benchmarks

Dataset	DER	Speed
VoxConverse (232 files)	~14%	10x RT (CPU)
AMI (16 meetings)	~23%	7x RT (CPU)

~80% of pyannote's accuracy at 10× the speed on CPU — no GPU, no Python.

License

MIT

polyvoice 0.6.0-alpha.4

polyvoice

Quick Start

Features

Minimal Example

Python / C FFI

Benchmarks

License