Expand description
v1.0 Embedder trait + concrete extractors (CAM++, ResNet34) + pool +
overlap-mask helper.
Added in v0.6 (M2).
Structs§
- CamPlus
Plus Extractor - CAM++ embedder (Channel-Attentive Multi-scale Pooling). Dim is supplied
explicitly because WeSpeaker ships several CAM++ variants:
voxceleb_CAM++.onnxis 512-d; smaller variants exist at 192-d. Targets the Mobile profile of v1.0; M5 may swap to INT8 + smaller dim. Uses the same 80-bin log-mel fbank pipeline as ResNet34. - Embedder
Pool - Lock-free pool of
Embedderinstances for concurrent extraction. - ResNet34
Adapter - New-trait adapter for the existing
FbankOnnxExtractor(WeSpeaker ResNet34, 256-d).
Enums§
- Embedder
Error - Errors from
Embedderimplementations.
Traits§
- Embedder
- Speaker embedding extractor — turns a slice of 16 kHz mono audio into a fixed-dimension embedding vector. Implementations are expected to L2-normalize their output so cosine similarity is a meaningful metric downstream.
Functions§
- apply_
overlap_ mask - { true }
pub fn apply_overlap_mask( audio: &[f32], overlap_regions: &[(f32, f32)], sample_rate: u32, ) -> Vec<f32>{ ret.len() == audio.len() } Zero-fill audio samples in regions where the segmenter flagged a 2-speaker overlap. The returnedVec<f32>is a copy ofaudiowith zeros in the(start_secs, end_secs)ranges listed inoverlap_regions.