Spectrograms
FFT-based computations for audio and image processing with Rust and Python bindings.
Spectrograms is a library for computing spectrograms and performing FFT-based operations on 1D signals (audio) and 2D signals (images).
Originally, the library was focused on computing spectrograms for audio analysis, but since I had to implement FFT backends anyway, I expanded the scope to include general 1D and 2D FFT operations for both audio and image processing.
If you want to learn more on the background of spectrograms and FFTs in audio/image processing, check out the manual (WIP).
Features
Core Features
- Plan-Based Computation: Reuse and cache FFT plans for speedup on batch processing
- Two FFT Backends: pure-Rust RealFFT/RustFFT (default) or FFTW
- Type-Safe Rust API: Compile-time guarantees for fft and spectrogram types
- Python Bindings: Fast computation with NumPy integration and GIL-free execution. Outperforms NumPy/SciPy implementations across a wide range of configurations, all while providing a type-safe and simple API.
Audio Processing
- Multiple Frequency Scales: Linear, Mel, ERB, and CQT
- Multiple Amplitude Scales: Power, Magnitude, and Decibels
- Advanced Audio Features: MFCC, Chromagram, and raw STFT
- Streaming Support: Frame-by-frame processing for real-time applications
Image Processing
- 2D FFT Operations: Fast 2D Fourier transforms for images
- Spatial Filtering: Low-pass, high-pass, and band-pass filters
- Convolution: FFT-based convolution (faster for large kernels)
- Edge Detection: Frequency-domain edge emphasis
Why Choose Spectrograms?
- Multi-Domain: Unified API for audio (1D) and image (2D) FFT operations
- Cross-Language: Use from Rust or Python with consistent APIs
- High Performance: Rust implementation with minimal overhead backed by benchmarks
- Batch/Stream Processing: Efficient batch processing and streaming support
- Well Documented: Comprehensive manual (WIP), lots of examples, and API documentation.
Installation
Quick Start
Generate a Test Signal
To remove checks prior to computation the crate uses the non-empty-slice crate to guarantee non-empty input.
Alongside this, the crate uses NonZeroUsize from the standard library to ensure parameters like FFT size and hop size are valid.
To avoid having to constantly called NonZeroUsize::new(constant)? the crates provides the nzu! macro to create NonZeroUsize values at compile time.
use PI;
use non_empty_vec;
// 1 second of 440 Hz sine wave
let sample_rate = 16000.0;
let samples: =
.map
.collect;
let samples = non_empty_vec;
# 1 second of 440 Hz sine wave
= 16000
=
=
Compute a Basic Spectrogram
use *;
// Configure parameters
let stft = new?;
let params = new?;
// Compute power spectrogram
let spec = compute?;
println!;
# Configure parameters
=
=
# Compute power spectrogram
=
Due to the interpretated nature of Python there is no compile-time guarantee for non-empty input or valid parameters so checks MUST be performed by the python bindings (internally). This incurs negligible overhead and ensures safety.
Mel Spectrogram Example
use *;
let stft = new?;
let params = new?;
// Mel filterbank
let mel = new?;
// dB scaling
let db = new?;
// Compute mel spectrogram in dB
let spec = compute?;
// Access data
println!;
println!;
println!;
=
=
# Mel filterbank
=
# dB scaling
=
# Compute mel spectrogram in dB
=
# Access data
Efficient Batch Processing
Reuse FFT plans when processing multiple signals:
use *;
use non_empty_vec;
let signals = vec!;
let stft = new?;
let params = new?;
let mel = new?;
let db = new?;
// Create plan once
let planner = new;
let mut plan = planner.?;
// Reuse for all signals (much faster!)
for signal in signals
=
=
=
=
=
# Create plan once
=
=
# Reuse for all signals (much faster!)
=
# Process spec...
2D FFT and Image Processing
Perform 2D FFTs, convolution, and spatial filtering on images:
use *;
use *;
use Array2;
// Create a 256x256 image
let image = from_shape_fn;
// Compute 2D FFT
let spectrum = fft2d?;
println!;
// Output: [256, 129] due to Hermitian symmetry
// Apply Gaussian blur via FFT
let kernel = gaussian_kernel_2d;
let blurred = convolve_fft?;
// Apply high-pass filter for edge detection
let edges = highpass_filter?;
// Compute power spectrum
let power = power_spectrum_2d?;
# Create a 256x256 image
=
=
# Compute 2D FFT
=
# Output: (256, 129) due to Hermitian symmetry
# Apply Gaussian blur via FFT
=
=
# Apply high-pass filter for edge detection
=
# Compute power spectrum
=
Efficient Batch Image Processing
Reuse 2D FFT plans for faster processing:
use Fft2dPlanner;
use Array2;
let images = vec!;
// Create planner once
let mut planner = new;
// Reuse for all images (faster!)
for image in &images
=
# Create planner once
=
# Reuse for all images (faster!)
=
= ** 2
# Process power spectrum...
Advanced Features
MFCCs (Mel-Frequency Cepstral Coefficients)
use *;
let stft = new?;
let mfcc_params = new;
let mfccs = mfcc?;
// Shape: (13, n_frames)
println!;
=
=
=
# Shape: (13, n_frames)
Chromagram (Pitch Class Profiles)
use *;
let stft = new?;
let chroma_params = music_standard?;
let chroma = chromagram?;
// Shape: (12, n_frames) - one row per pitch class
println!;
=
=
=
# Shape: (12, n_frames)
Supported Spectrogram Types
Frequency Scales
- Linear (
LinearHz): Standard FFT bins, evenly spaced in Hz - Mel (
Mel): Mel-frequency scale, perceptually motivated for speech/audio - ERB (
Erb): Equivalent Rectangular Bandwidth, models auditory perception - CQT: Constant-Q Transform for music analysis
- Log (
LogHz): Logarithmic frequency spacing
Amplitude Scales
| Scale | Formula | Use Case |
|---|---|---|
| Power | |X|² |
Energy analysis, ML features |
| Magnitude | |X| |
Spectral analysis, phase vocoder |
| Decibels | 10·log₁₀(power) |
Visualization, perceptual analysis |
Type Aliases (Rust)
// Linear frequency
type LinearPowerSpectrogram = ;
type LinearMagnitudeSpectrogram = ;
type LinearDbSpectrogram = ;
// Mel frequency
type MelPowerSpectrogram = ;
type MelMagnitudeSpectrogram = ;
type MelDbSpectrogram = ;
// ERB frequency
type ErbPowerSpectrogram = ;
type ErbMagnitudeSpectrogram = ;
type ErbDbSpectrogram = ;
Window Functions
Supported window functions with different frequency/time resolution trade-offs:
rectangular: No windowing (best frequency resolution, high leakage)hanning: Good general-purpose window (default)hamming: Similar to Hanning with different coefficientsblackman: Low sidelobes, wider main lobekaiser=<beta>: Tunable trade-off (β controls shape, e.g.,kaiser=5.0)gaussian=<std>: Smooth roll-off (e.g.,gaussian=0.4)
// Parse from string
let window: WindowType = "hanning".parse?;
let kaiser: WindowType = "kaiser=8.0".parse?;
// Or use constructors
let hann = Hanning;
let gauss = Gaussian ;
// Generate windows
let hann_window = hanning_window;
let kaiser_window = kaiser;
// etc.
# Use class methods
=
=
=
# Or from string
=
Default Presets
// Speech processing preset
// n_fft=512, hop_size=160
let params = speech_default?;
// Music processing preset
// n_fft=2048, hop_size=512
let params = music_default?;
# Speech processing preset
=
# Music processing preset
=
Accessing Results
let spec = compute?;
// Dimensions
let n_bins = spec.n_bins;
let n_frames = spec.n_frames;
// Data (ndarray::Array2<f64>)
let data = spec.data;
// Axes
let freqs = spec.axes.frequencies;
let times = spec.axes.times;
let = spec.axes.frequency_range;
let duration = spec.axes.duration;
// Original parameters
let params = spec.params;
=
# Dimensions
=
=
# Data (numpy array)
= # shape: (n_bins, n_frames)
# Axes
=
=
, =
=
# Original parameters
=
Examples
Comprehensive examples in both languages:
Rust (examples/):
basic_linear.rs- Simple linear spectrogrammel_spectrogram.rs- Mel spectrogram with dB scalingreuse_plan.rs- Batch processing with plan reusecompare_windows.rs- Window function comparisonamplitude_scales.rs- Power, Magnitude, and dB
Python (python/examples/):
basic_linear.py- Linear spectrogram basicsmel_spectrogram.py- Mel spectrogramsmfcc_example.py- MFCC computationchromagram_example.py- Pitch class profilesbatch_processing.py- Efficient batch processingstreaming.py- Real-time frame-by-frame processing
Documentation
- Manual: Comprehensive manual (WIP)
- API Documentation: Full Rust API reference
- Python Documentation: Python API reference and guides
- Contributing Guide: How to contribute to the project
Feature Flags (Rust)
The Rust library requires exactly one FFT backend:
-
fftw: Uses FFTW for FFT computation- Requires system FFTW library (
libfftw3-devon Ubuntu/Debian) - Not pure Rust
- Requires system FFTW library (
-
realfft(default): Pure-Rust FFT implementation- No system dependencies
- Slightly slower than FFTW
- Works everywhere
Additional flags:
python: Enables Python bindingsserde: Enables serialization support
# Pure Rust, no Python
[]
= { = "0.2", = false, = ["realfft"] }
# FFTW backend with Python
[]
= { = "0.2", = false, = ["fftw", "python"] }
Benchmarks
Performance benchmarks are available in the benches/. Run with:
For Python, a comprehensive benchmark notebook is available at python/examples/notebook.ipynb with results comparing spectrograms to numpy and scipy.fft.
| Operator | Rust (ms) | Rust Std | Numpy (ms) | Numpy Std | Scipy (ms) | Scipy Std | Avg Speedup vs NumPy | Avg Speedup vs SciPy |
|---|---|---|---|---|---|---|---|---|
| db | 0.257 | 0.165 | 0.350 | 0.251 | 0.451 | 0.366 | 1.363 | 1.755 |
| erb | 0.601 | 0.437 | 3.713 | 2.703 | 3.714 | 2.723 | 6.178 | 6.181 |
| loghz | 0.178 | 0.149 | 0.547 | 0.998 | 0.534 | 0.965 | 3.068 | 2.996 |
| magnitude | 0.140 | 0.089 | 0.198 | 0.133 | 0.319 | 0.277 | 1.419 | 2.287 |
| mel | 0.180 | 0.139 | 0.630 | 0.851 | 0.612 | 0.801 | 3.506 | 3.406 |
| power | 0.126 | 0.082 | 0.205 | 0.141 | 0.327 | 0.288 | 1.630 | 2.603 |
For the full benchmark results, see PYTHON_BENCHMARK.
Performance Tips
- Reuse plans: Use
SpectrogramPlannerfor speedups on batch processing - Choose power-of-2 FFT sizes: Best performance (512, 1024, 2048, 4096)
- Streaming: Use frame-by-frame processing for real-time applications
- FFT: Try both backends (
realfftandfftw) to see which is faster for your use case
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Citation
If you use this library in academic work, please cite:
Note: This library focuses on computing ffts, spectrograms, and related transforms. For complete audio analysis pipelines, combine it with audio I/O libraries like audio_samples and your preferred plotting tools.
