stft-rs
High-quality, streaming-friendly STFT/iSTFT implementation in Rust working with raw slices (&[f32]).
Features
- Batch Processing: Process entire audio buffers at once
- Streaming Support: Incremental processing for real-time applications
- High Quality: >138 dB SNR reconstruction
- Dual Reconstruction Modes:
- OLA (Overlap-Add): Optimal for spectral processing
- WOLA (Weighted Overlap-Add): Standard implementation
- Multiple Window Functions: Hann, Hamming, Blackman
- NOLA/COLA Validation: Ensures reconstruction quality
- No External Tensor Libraries: Works directly with slices
Quick Start
use *;
let config = default_4096;
let stft = new;
let istft = new;
let signal: = vec!;
let spectrum = stft.process;
// Manipulate spectrum here...
let reconstructed = istft.process;
Prelude
For convenience, import commonly used types with:
use *;
This exports:
BatchStft,BatchIstftStreamingStft,StreamingIstftStftConfigSpectrum,SpectrumFrameReconstructionMode,WindowType,PadModeapply_padding
Batch vs Streaming
Batch API (Stateless)
Best for: Processing entire files, offline processing, ML training
use *;
let config = default_4096;
let stft = new;
let istft = new;
let spectrum = stft.process;
let reconstructed = istft.process;
Streaming API (Stateful)
Best for: Real-time audio, low-latency processing, incremental processing
use *;
let config = default_4096;
let mut stft = new;
let mut istft = new;
let pad_amount = config.fft_size / 2;
let padded = apply_padding;
let mut output = Vecnew;
for chunk in padded.chunks
for frame in stft.flush
output.extend;
// Remove padding: output[pad_amount..pad_amount + signal.len()]
Note on Padding in Streaming Mode:
- Batch mode automatically applies reflection padding internally for optimal quality
- Streaming mode requires manual padding for best results (>130 dB SNR)
- Without padding, edge effects reduce quality to ~40-60 dB SNR
- Use
apply_padding()helper function or implement custom padding - For truly real-time applications without pre-roll, accept the edge artifacts or use fade-in/fade-out
Configuration
Creating Custom Configurations
use *;
// OLA mode
let config = new.expect;
// WOLA mode
let config = new.expect;
Window Functions
- Hann: Smooth frequency response, good general purpose
- Hamming: Slightly better frequency resolution
- Blackman: Lower side lobes, better for spectral analysis
Reconstruction Modes
OLA (Overlap-Add)
- Window applied on forward transform only
- No window on inverse transform
- Normalizes by accumulated window energy:
sum(w) - Use for: Spectral processing, modification, filtering
- Requires: COLA (Constant Overlap-Add) condition
WOLA (Weighted Overlap-Add)
- Window applied on both forward and inverse transforms
- Normalizes by accumulated window squared:
sum(w²) - Use for: Standard analysis/resynthesis
- Requires: NOLA (Nonzero Overlap-Add) condition
Spectral Processing
The library provides easy access to manipulate spectrum data:
let mut spectrum = stft.process;
// Access individual frames and bins
for frame in 0..spectrum.num_frames
// Or iterate over frames
for frame in spectrum.frames
Examples
High-pass Filter
let mut spectrum = stft.process;
let cutoff_bin = 100;
for frame in 0..spectrum.num_frames
let filtered = istft.process;
Performance Characteristics
- Batch Mode: Optimized for throughput, minimal allocations
- Streaming Mode: Optimized for latency, incremental output
- Memory: Batch allocates once, streaming uses growing buffers
- Latency: Streaming introduces
fft_size - hop_sizesamples of latency
Typical Performance (4096 FFT, 1024 hop)
- Reconstruction Quality: >138 dB SNR
- Algorithmic Latency: 3072 samples (69.7 ms @ 44.1kHz)
- Throughput: Depends on FFT implementation (rustfft)
Examples
Run the included examples:
# Basic batch processing
# Streaming processing with chunks
# Spectral manipulation (filtering, time-varying processing)
Implementation Details
Critical Design Decisions
- Flat Data Layout:
Spectrumstores data as[real_all, imag_all]for cache efficiency - Padding: Batch mode uses reflection padding (fft_size/2 on each side)
- Normalization: Per-sample normalization by accumulated window energy
- Conjugate Symmetry: Automatically handled in iSTFT for real signals
- Streaming Latency: Samples released only when fully reconstructed (all overlaps complete)
STFT Formula
X[k,n] = Σ x[n + m] * w[m] * e^(-j2πkm/N)
Where:
x[n]: Input signalw[m]: Window functionN: FFT sizek: Frequency binn: Frame index (hop positions)
iSTFT Reconstruction
OLA Mode:
x[n] = Σ IFFT(X[k,m]) / Σ w[n - m*hop]
WOLA Mode:
x[n] = Σ IFFT(X[k,m]) * w[n - m*hop] / Σ w²[n - m*hop]
Testing
Run the comprehensive test suite:
# With output
Tests verify:
- NOLA/COLA condition validation
- Batch OLA roundtrip (>138 dB SNR)
- Batch WOLA roundtrip (>138 dB SNR)
- Streaming OLA roundtrip (>138 dB SNR)
- Streaming WOLA roundtrip (>138 dB SNR)
- Batch vs streaming consistency
- All window functions (Hann, Hamming, Blackman)
- Constant signal reconstruction
- Padding modes (reflect, zero, edge)
Dependencies
rustfft: High-performance FFT implementationndarray: Only for internal padding operations (minimal usage)
License
[MIT]
Contributing
Contributions welcome! Areas for improvement:
- Additional window functions (Kaiser, Gaussian)
- SIMD optimizations
- GPU acceleration support
- Multi-channel support
- More padding modes
- Overlap-save mode