Module segmentation

Expand description

Speaker segmentation: powerset-classifier + sliding-window aggregator.

Added in v0.6 (M1).

Structs§

AggregationConfig: Configuration for aggregation.
Aggregator: Aggregator over sliding-window powerset outputs.
FrameLabel: Decoded label for a single audio frame.
PowersetConfig: Tunable parameters for PowersetSegmenter.
PowersetDecoder: Stateless decoder; methods are associated functions because no per-instance configuration is needed.
PowersetSegmenter: ONNX-backed powerset speaker segmenter.
RawSegment: One contiguous segment attributed to a single local speaker index.
WindowOutput: One window’s segmentation output.

PowersetClass: One of the seven powerset classes, identifying which speakers are active.
SegmentationError: Errors from Segmenter implementations.

MIN_AUDIO_SAMPLES: Minimum audio length (16 kHz samples) accepted by Segmenter::segment.

Segmenter: A speaker segmentation engine — turns raw audio into spans of speech attributed to local speaker indices, with overlap detection.