pub fn find_transients(
params: IterativeTransientDetectionParams,
data: &mut AudioBuffer<f32>,
) -> Vec<f32>Expand description
Implements iterative STFT transient detection for polyphonic signals. Output is a monophonic audio track of the transient signal. Not real-time safe.
Reference:
Inputs to the algorithm:
fft_size- Size of the FFT windowfft_overlap- Amount of overlap between windowsv-power_of_change_spectral_spread(equation 5)β-threshold_time_spread_factor(equation 7)- Internal multiplier of dynamic thresholds
- This nº affects by what factor a frequency bin’s rate of change needs to be higher than its time-domain neighbours
- Higher nºs means sensitivity is decreased
λThr-frequency_bin_change_threshold(equation 10)- If this amount of frequency bins have changed, this frame will be considered a transient
δ-iteration_magnitude_factor(equation 10)- What factor of the magnitude is collected onto the output per iteration
N-iteration_count(algorithm 1)
The algorithm is as follows:
- Perform FFT with overlapping windows at 3/4’s ratio (e.g. one 40ms window every 30ms)
- Calculate
M(frame, bin)magnitudes for each frame/bin - Let
P(frame, bin)be the outputtransient_magnitude_frames- These are the transient magnitude frames
- e.g. Magnitudes of the transients, per frequency bin, over time
- for iteration in 0..
N- For each frame/bin, calculate power of change value
F(frame, bin)- First we calculate
T-(frame, bin)andT+(frame, bin), respectively the deltas in magnitude with the previous and next frames respectively F(frame, bin)(power of change represents how much its magnitude is higher compared with next and previous frames, if it’s higher than its next/previous frames (0.0 if its not higher than neighbouring time-domain frames)- For each
binthispower_of_changeis summed with itsv(power_of_change_spectral_spread) neighbour frequency bins - This is for of ‘peak detection’ in some way, it’s finding frames higher than their time-domain peers and quantifying how much they’re larger than them
- First we calculate
- Calculate
dynamic_thresholdsλ(frame, bin)- For this, on every frequency bin, the threshold is the average power of change of the
l(threshold_time_spread) neighbouring time-domain frames, multiplied by a magic constantβ(threshold_time_spread_factor) - A frequency bin’s threshold is defined by how much its neighbour frequency bins have
changed in this frame* (change being quantified by
F)
- For this, on every frequency bin, the threshold is the average power of change of the
- Calculate
Γ(frame, bin)(have_bins_changed), by flipping a flag to 1 or 0 depending on whether power of changeF(frame, bin)is higher than its dynamic thresholdλ(frame, bin) - Calculate
ΣΓnum_changed_bins, by counting the number of frequency bins that have changed in this frame- Simply sum the above for each frame
- If
ΣΓ(frame)num_changed_binsis higher thanλThr-frequency_bin_change_threshold- Update
P(frame, bin)-transient_magnitude_framesaddingX(frame, bin)timesδiteration_magnitude_factoronto it - Subtract
(1 - δ) * X(frame, bin)fromX(frame, bin)
- Update
- For each frame/bin, calculate power of change value
- At the end of
Niterations, perform the inverse fourier transform over the each polar complex nº frame using magnitudes intransient_magnitude_framesand using phase from the input FFT result - There may now be extra filtering / smoothing steps to extract data or audio, but the output should be the transient signal