Skip to main content

Module timing

Module timing 

Source
Expand description

Phoneme timing extraction from ONNX model duration output.

VITS models optionally output a durations tensor [1, phoneme_length] containing the number of frames (hop_length-sized) each phoneme occupies. This module converts frame counts to millisecond timestamps.

Structs§

PhonemeTimingInfo
Timing information for a single phoneme
TimingResult
Complete timing result for a synthesized utterance

Constants§

DEFAULT_HOP_LENGTH
Default hop length for VITS models

Functions§

durations_to_timing
Convert duration tensor output to timing information.