native-pyannote-rs 0.1.0

Speaker diarization using pyannote in Rust

Coverage
0%
0 out of 11 items documented0 out of 1 items with examples
Size
Source code size: 46.13 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 2.77 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 1m 2s Average build duration of successful builds.
all releases: 1m 10s Average build duration of successful builds in releases after 2024-10-23.
Links
RustedBytes/pyannote-rs
7 1 0
crates.io
Dependencies
Versions
Owners

native-pyannote-rs

Pyannote audio diarization in Rust.

This is a fork of https://github.com/thewh1teagle/pyannote-rs with Rust native crate for audio feature extraction using kaldi-native-fbank instead of bindings to C++ variant (knf-rs).

Features

Compute 1 hour of audio in less than a minute on CPU.
Faster performance with DirectML on Windows and CoreML on macOS.
Accurate timestamps with Pyannote segmentation.
Identify speakers with wespeaker embeddings.

Examples

pyannote-rs uses 2 models for speaker diarization:

Segmentation: segmentation-3.0 identifies when speech occurs.
Speaker Identification: wespeaker-voxceleb-resnet34-LM identifies who is speaking.

Inference is powered by onnxruntime.

The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).
The embedding model processes filter banks (audio features) extracted with kaldi-native-fbank.

Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.

Credits

Big thanks to pyannote-onnx and kaldi-native-fbank