audio_processor_analysis::transient_detection::stft

Function find_transients

pub fn find_transients(
    params: IterativeTransientDetectionParams,
    data: &mut AudioBuffer<f32>,
) -> Vec<f32>

Expand description

Implements iterative STFT transient detection for polyphonic signals. Output is a monophonic audio track of the transient signal. Not real-time safe.

Reference:

https://www.researchgate.net/profile/Balaji-Thoshkahna/publication/220723752_A_Transient_Detection_Algorithm_for_Audio_Using_Iterative_Analysis_of_STFT/links/0deec52e6331412aed000000/A-Transient-Detection-Algorithm-for-Audio-Using-Iterative-Analysis-of-STFT.pdf

Inputs to the algorithm:

fft_size - Size of the FFT window
fft_overlap - Amount of overlap between windows
v - power_of_change_spectral_spread (equation 5)
β - threshold_time_spread_factor (equation 7)
- Internal multiplier of dynamic thresholds
- This nº affects by what factor a frequency bin’s rate of change needs to be higher than its time-domain neighbours
- Higher nºs means sensitivity is decreased
λThr - frequency_bin_change_threshold (equation 10)
- If this amount of frequency bins have changed, this frame will be considered a transient
δ - iteration_magnitude_factor (equation 10)
- What factor of the magnitude is collected onto the output per iteration
N - iteration_count (algorithm 1)

The algorithm is as follows:

Perform FFT with overlapping windows at 3/4’s ratio (e.g. one 40ms window every 30ms)
Calculate M(frame, bin) magnitudes for each frame/bin
Let P(frame, bin) be the output transient_magnitude_frames
- These are the transient magnitude frames
- e.g. Magnitudes of the transients, per frequency bin, over time
for iteration in 0..N
- For each frame/bin, calculate power of change value F(frame, bin)
  - First we calculate T-(frame, bin) and T+(frame, bin), respectively the deltas in magnitude with the previous and next frames respectively
  - F(frame, bin) (power of change represents how much its magnitude is higher compared with next and previous frames, if it’s higher than its next/previous frames (0.0 if its not higher than neighbouring time-domain frames)
  - For each bin this power_of_change is summed with its v (power_of_change_spectral_spread) neighbour frequency bins
  - This is for of ‘peak detection’ in some way, it’s finding frames higher than their time-domain peers and quantifying how much they’re larger than them
- Calculate dynamic_thresholds λ(frame, bin)
  - For this, on every frequency bin, the threshold is the average power of change of the l (threshold_time_spread) neighbouring time-domain frames, multiplied by a magic constant β (threshold_time_spread_factor)
  - A frequency bin’s threshold is defined by how much its neighbour frequency bins have changed in this frame* (change being quantified by F)
- Calculate Γ(frame, bin) (have_bins_changed), by flipping a flag to 1 or 0 depending on whether power of change F(frame, bin) is higher than its dynamic threshold λ(frame, bin)
- Calculate ΣΓ num_changed_bins, by counting the number of frequency bins that have changed in this frame
  - Simply sum the above for each frame
- If ΣΓ(frame) num_changed_bins is higher than λThr - frequency_bin_change_threshold
  - Update P(frame, bin) - transient_magnitude_frames adding X(frame, bin) times δ iteration_magnitude_factor onto it
  - Subtract (1 - δ) * X(frame, bin) from X(frame, bin)
At the end of N iterations, perform the inverse fourier transform over the each polar complex nº frame using magnitudes in transient_magnitude_frames and using phase from the input FFT result
There may now be extra filtering / smoothing steps to extract data or audio, but the output should be the transient signal

Function find_transientsCopy item path

Function find_transients