Function find_transients

Source
pub fn find_transients(
    params: IterativeTransientDetectionParams,
    data: &mut AudioBuffer<f32>,
) -> Vec<f32>
Expand description

Implements iterative STFT transient detection for polyphonic signals. Output is a monophonic audio track of the transient signal. Not real-time safe.

Reference:

Inputs to the algorithm:

  • fft_size - Size of the FFT window
  • fft_overlap - Amount of overlap between windows
  • v - power_of_change_spectral_spread (equation 5)
  • β - threshold_time_spread_factor (equation 7)
    • Internal multiplier of dynamic thresholds
    • This nº affects by what factor a frequency bin’s rate of change needs to be higher than its time-domain neighbours
    • Higher nºs means sensitivity is decreased
  • λThr - frequency_bin_change_threshold (equation 10)
    • If this amount of frequency bins have changed, this frame will be considered a transient
  • δ - iteration_magnitude_factor (equation 10)
    • What factor of the magnitude is collected onto the output per iteration
  • N - iteration_count (algorithm 1)

The algorithm is as follows:

  • Perform FFT with overlapping windows at 3/4’s ratio (e.g. one 40ms window every 30ms)
  • Calculate M(frame, bin) magnitudes for each frame/bin
  • Let P(frame, bin) be the output transient_magnitude_frames
    • These are the transient magnitude frames
    • e.g. Magnitudes of the transients, per frequency bin, over time
  • for iteration in 0..N
    • For each frame/bin, calculate power of change value F(frame, bin)
      • First we calculate T-(frame, bin) and T+(frame, bin), respectively the deltas in magnitude with the previous and next frames respectively
      • F(frame, bin) (power of change represents how much its magnitude is higher compared with next and previous frames, if it’s higher than its next/previous frames (0.0 if its not higher than neighbouring time-domain frames)
      • For each bin this power_of_change is summed with its v (power_of_change_spectral_spread) neighbour frequency bins
      • This is for of ‘peak detection’ in some way, it’s finding frames higher than their time-domain peers and quantifying how much they’re larger than them
    • Calculate dynamic_thresholds λ(frame, bin)
      • For this, on every frequency bin, the threshold is the average power of change of the l (threshold_time_spread) neighbouring time-domain frames, multiplied by a magic constant β (threshold_time_spread_factor)
      • A frequency bin’s threshold is defined by how much its neighbour frequency bins have changed in this frame* (change being quantified by F)
    • Calculate Γ(frame, bin) (have_bins_changed), by flipping a flag to 1 or 0 depending on whether power of change F(frame, bin) is higher than its dynamic threshold λ(frame, bin)
    • Calculate ΣΓ num_changed_bins, by counting the number of frequency bins that have changed in this frame
      • Simply sum the above for each frame
    • If ΣΓ(frame) num_changed_bins is higher than λThr - frequency_bin_change_threshold
      • Update P(frame, bin) - transient_magnitude_frames adding X(frame, bin) times δ iteration_magnitude_factor onto it
      • Subtract (1 - δ) * X(frame, bin) from X(frame, bin)
  • At the end of N iterations, perform the inverse fourier transform over the each polar complex nº frame using magnitudes in transient_magnitude_frames and using phase from the input FFT result
  • There may now be extra filtering / smoothing steps to extract data or audio, but the output should be the transient signal