stft-rs
High-quality, streaming-friendly STFT/iSTFT implementation in Rust working with raw slices (&[f32]).
[!CAUTION] This crate is a WIP, expect API changes and breakage until first stable version
Features
- Batch Processing: Process entire audio buffers at once
- Streaming Support: Incremental processing for real-time applications
- High Quality: >138 dB SNR reconstruction
- no_std Support: Run on embedded systems without the standard library! 🚀
- Dual FFT Backends: Choose the right backend for your environment
rustfft(default): Full-featured for std environments, supports f32/f64 and any FFT sizemicrofft: Lightweight for no_std/embedded, f32 only, power-of-2 sizes up to 4096
- Dual FFT Backends: Choose the right backend for your environment
- Dual Reconstruction Modes:
- OLA (Overlap-Add): Optimal for spectral processing
- WOLA (Weighted Overlap-Add): Standard implementation
- Multiple Window Functions: Hann, Hamming, Blackman
- NOLA/COLA Validation: Ensures reconstruction quality
- Flexible Buffer Management: Three allocation strategies from simple to zero-allocation
- Multi-Channel Audio: Process stereo, 5.1, 7.1+ with planar or interleaved formats (parallelized with rayon)
- Generic Float Support: Works with f32, f64, and other float types
- Type Aliases: Convenient aliases like
StftConfigF32,BatchStftF32for cleaner code - Spectral Operations: Built-in helpers for magnitude/phase manipulation, filtering, and custom processing
- Mel Spectrograms: Perceptual frequency analysis with HTK/Slaney scales, log-mel, and delta features → MEL.md
- No External Tensor Libraries: Works directly with slices
Quick Start
use *;
// Quick start with defaults
let config = default_4096;
// Or use the builder for custom configuration
let config = builder
.fft_size
.hop_size
.build
.expect;
let stft = new;
let istft = new;
let signal: = vec!;
let spectrum = stft.process;
// Manipulate spectrum here...
let reconstructed = istft.process;
Mel Spectrogram Quick Start
use *;
// Create STFT and mel processors
let stft_config = default_4096;
let stft = new;
let mel_config = default; // 80 mel bins, Slaney scale
let mel_proc = new;
// Process audio to mel spectrogram
let signal: = vec!;
let spectrum = stft.process;
let mel_spec = mel_proc.process;
// Convert to log-mel (dB scale) and add delta features
let log_mel = mel_spec.to_db;
let with_deltas = log_mel.with_deltas; // 240 features (80*3)
// See MEL.md for complete documentation
Type Aliases for Convenience
For cleaner code, use type aliases instead of specifying generic types:
use *;
// Instead of StftConfig::<f32>, use:
let config = default_4096;
let stft = new;
let istft = new;
// With builder:
let config = new
.fft_size
.hop_size
.build
.expect;
// Available aliases:
// - StftConfigF32, StftConfigF64, StftConfigBuilderF32, StftConfigBuilderF64
// - BatchStftF32, BatchIstftF32, BatchStftF64, BatchIstftF64
// - StreamingStftF32, StreamingIstftF32, StreamingStftF64, StreamingIstftF64
// - SpectrumF32, SpectrumF64
// - SpectrumFrameF32, SpectrumFrameF64
Prelude
For convenience, import commonly used types with:
use *;
This exports:
- Core types:
BatchStft,BatchIstft,StreamingStft,StreamingIstft,StftConfig,StftConfigBuilder,Spectrum,SpectrumFrame,Complex - Type aliases:
StftConfigF32/F64,StftConfigBuilderF32/F64,BatchStftF32/F64,BatchIstftF32/F64,StreamingStftF32/F64,StreamingIstftF32/F64,SpectrumF32/F64,SpectrumFrameF32/F64 - Mel types:
MelConfig,MelSpectrum,BatchMelSpectrogram,StreamingMelSpectrogram,MelScale,MelNorm(+ F32/F64 aliases) - Enums:
ReconstructionMode,WindowType,PadMode - Utilities:
apply_padding,interleave,deinterleave,interleave_into,deinterleave_into
Batch vs Streaming
Batch API (Stateless)
Best for: Processing entire files, offline processing, ML training
use *;
let config = default_4096;
let stft = new;
let istft = new;
let spectrum = stft.process;
let reconstructed = istft.process;
Streaming API (Stateful)
Best for: Real-time audio, low-latency processing, incremental processing
use *;
let config = default_4096;
let mut stft = new;
let mut istft = new;
let pad_amount = config.fft_size / 2;
let padded = apply_padding;
let mut output = Vecnew;
for chunk in padded.chunks
for frame in stft.flush
output.extend;
// Remove padding: output[pad_amount..pad_amount + signal.len()]
Note on Padding in Streaming Mode:
- Batch mode automatically applies reflection padding internally for optimal quality
- Streaming mode requires manual padding for best results (>130 dB SNR)
- Without padding, edge effects reduce quality to ~40-60 dB SNR
- Use
apply_padding()helper function or implement custom padding - For truly real-time applications without pre-roll, accept the edge artifacts or use fade-in/fade-out
Buffer Management
The library provides three allocation strategies for different performance requirements:
Level 1: Simple API (Allocates on each call)
Best for: Prototyping, one-off processing, simplicity
// Each call allocates new Vec for frames/samples
let frames = stft.push_samples;
let samples = istft.push_frame;
Level 2: Reusable Containers (_into methods)
Best for: Repeated processing, reduced allocator pressure
// Reuse outer Vec, but still allocates frame data
let mut frames = Vecnew;
let mut output = Vecnew;
loop
Batch mode:
let mut spectrum = new;
let mut output = Vecnew;
stft.process_into;
istft.process_into;
Level 3: Zero-Allocation Frame Pool (_write methods)
Best for: Real-time audio, hard real-time constraints, minimum latency variance
// Pre-allocate frame pool once
let max_frames = / config.hop_size + 1;
let mut frame_pool = vec!;
let mut output = Vecnew;
loop
Configuration
Creating Custom Configurations
Use the builder pattern for a flexible, ergonomic API:
use *;
// OLA mode with builder
let config = builder
.fft_size
.hop_size
.window
.reconstruction_mode
.build
.expect;
// WOLA mode with defaults (Hann window, OLA mode)
let config = builder
.fft_size
.hop_size
.reconstruction_mode
.build
.expect;
// Type aliases for cleaner code
let config = new
.fft_size
.hop_size
.window
.build
.expect;
Legacy API (deprecated):
// Old constructor still works but is deprecated
let config = new.expect;
Window Functions
- Hann: Smooth frequency response, good general purpose
- Hamming: Slightly better frequency resolution
- Blackman: Lower side lobes, better for spectral analysis
Reconstruction Modes
OLA (Overlap-Add)
- Window applied on forward transform only
- No window on inverse transform
- Normalizes by accumulated window energy:
sum(w) - Use for: Spectral processing, modification, filtering
- Requires: COLA (Constant Overlap-Add) condition
WOLA (Weighted Overlap-Add)
- Window applied on both forward and inverse transforms
- Normalizes by accumulated window squared:
sum(w²) - Use for: Standard analysis/resynthesis
- Requires: NOLA (Nonzero Overlap-Add) condition
Spectral Processing
The library provides powerful helpers for frequency domain manipulation:
let mut spectrum = stft.process;
// Get magnitude and phase
let mag = spectrum.magnitude;
let phase = spectrum.phase;
// Set from magnitude and phase
spectrum.set_magnitude_phase;
// Get all magnitudes/phases for a frame
let magnitudes = spectrum.frame_magnitudes;
let phases = spectrum.frame_phases;
// Apply gain to frequency range
spectrum.apply_gain; // Attenuate bins 100-200
// Zero out frequency range
spectrum.zero_bins; // Remove DC and low frequencies
// Custom processing with closure
spectrum.apply;
Examples
High-Pass Filter
let mut spectrum = stft.process;
// Zero out low frequencies (simple and clean!)
spectrum.zero_bins;
let filtered = istft.process;
Volume Control (Spectral Domain)
let mut spectrum = stft.process;
// Apply gain in magnitude/phase domain
for frame in 0..spectrum.num_frames
let quieter = istft.process;
Band-Pass Filter
let mut spectrum = stft.process;
// Keep only frequencies between 300 Hz and 3000 Hz
let sample_rate = 44100.0;
let freq_resolution = sample_rate / config.fft_size as f32;
let low_bin = as usize;
let high_bin = as usize;
spectrum.zero_bins;
spectrum.zero_bins;
let filtered = istft.process;
Multi-Channel Audio
Process stereo, 5.1, or any channel count. Channels are processed in parallel with rayon (enabled by default):
// Planar: separate Vec per channel
let left = vec!;
let right = vec!;
let spectra = stft.process_multichannel; // Parallel with rayon
// Interleaved: L,R,L,R...
let interleaved = vec!;
let spectra = stft.process_interleaved;
// Convert between formats
let channels = deinterleave;
let interleaved = interleave;
See examples/multichannel_stereo.rs and examples/multichannel_midside.rs for more.
Embedded / no_std Support
stft-rs can run on embedded systems without the standard library! Perfect for audio processing on microcontrollers, DSPs, and bare-metal environments.
Using the microfft Backend for no_std
[]
= { = "0.5.0", = false, = ["microfft-backend"] }
Important notes:
- microfft backend only supports f32 (not f64)
- FFT sizes must be power-of-2 from 2 to 4096
- Requires an allocator (uses
alloccrate)
Example no_std Configuration
extern crate alloc;
use Vec;
use *;
// Works great on embedded!
let config = builder
.fft_size // Must be power-of-2
.hop_size
.build
.expect;
let stft = new;
let istft = new;
let signal: = Vecfrom_slice;
let spectrum = stft.process;
let reconstructed = istft.process;
Feature Flags
std(default): Standard library support with rustfft backendrustfft-backend: Use rustfft for FFT (supports f32/f64, any size)microfft-backend: Use microfft for no_std (f32 only, power-of-2 sizes)rayon: Enable parallel multi-channel processing (requires std)
Note: You cannot enable both rustfft-backend and microfft-backend at the same time.
Performance Characteristics
- Batch Mode: Optimized for throughput, minimal allocations
- Streaming Mode: Optimized for latency, incremental output
- Memory: Batch allocates once, streaming uses growing buffers
- Latency: Streaming introduces
fft_size - hop_sizesamples of latency
Typical Performance (4096 FFT, 1024 hop)
- Reconstruction Quality: >138 dB SNR
- Algorithmic Latency: 3072 samples (69.7 ms @ 44.1kHz)
- Throughput: Depends on FFT implementation (rustfft)
Examples
Run the included examples:
# Basic batch processing
# Streaming processing with chunks
# Spectral manipulation (filtering, time-varying processing)
# Multi-channel stereo processing
# Mid/side stereo width manipulation
# Advanced streaming with buffer reuse patterns
# Performance comparison of allocation strategies
# Type aliases usage demonstration
# Spectral operations (magnitude/phase, filtering)
Implementation Details
Critical Design Decisions
- Flat Data Layout:
Spectrumstores data as[real_all, imag_all]for cache efficiency - Padding: Batch mode uses reflection padding (fft_size/2 on each side)
- Normalization: Per-sample normalization by accumulated window energy
- Conjugate Symmetry: Automatically handled in iSTFT for real signals
- Streaming Latency: Samples released only when fully reconstructed (all overlaps complete)
STFT Formula
X[k,n] = Σ x[n + m] * w[m] * e^(-j2πkm/N)
Where:
x[n]: Input signalw[m]: Window functionN: FFT sizek: Frequency binn: Frame index (hop positions)
iSTFT Reconstruction
OLA Mode:
x[n] = Σ IFFT(X[k,m]) / Σ w[n - m*hop]
WOLA Mode:
x[n] = Σ IFFT(X[k,m]) * w[n - m*hop] / Σ w²[n - m*hop]
Testing
Run the comprehensive test suite:
# With output
Tests verify:
- NOLA/COLA condition validation
- Batch OLA roundtrip (>138 dB SNR)
- Batch WOLA roundtrip (>138 dB SNR)
- Streaming OLA roundtrip (>138 dB SNR)
- Streaming WOLA roundtrip (>138 dB SNR)
- Batch vs streaming consistency
- All window functions (Hann, Hamming, Blackman)
- Constant signal reconstruction
- Padding modes (reflect, zero, edge)
Dependencies
Core dependencies:
num-traits: Generic numeric traits (no_std compatible withlibm)
FFT backends (mutually exclusive):
rustfft(default): High-performance FFT for std environmentsmicrofft(optional): Lightweight FFT for no_std/embedded
Optional dependencies:
rayon: Parallel multi-channel processing (requires std)
License
[MIT]
Contributing
Contributions welcome! Areas for improvement:
- Additional window functions (Kaiser, Gaussian)
- SIMD optimizations
- GPU acceleration support
- More padding modes
- Overlap-save mode