pyannote-rs

Install

cargo add pyannote-rs

pyannote-rs uses 2 models to achieve speaker diarization. The first one is segmentation-3.0 for segmentation (knowing when speech occur)

The second model is wespeaker-voxceleb-resnet34-LM which uses to identify who is speaking.

All the inference happens in onnxruntime

The sementation model expects input of at most 10s audio. So we feed it with sliding window of 10s (iterate 10s and feed).

The embedding model expects input of filter banks (extracted features from the audio), so we use knf-rs to extract them.

For speaker comparision (Eg. is Alis spoke again?) we use cosine similarity.