Spectrograms
ust High-performance spectrogram computation with Rust and Python bindings.
Features
- Multiple Frequency Scales: Linear, Mel, ERB, and CQT
- Multiple Amplitude Scales: Power, Magnitude, and Decibels
- Advanced Audio Features: MFCC, Chromagram, and raw STFT
- Plan-Based Computation: Reuse FFT plans for 2-10x speedup on batch processing
- Two FFT Backends: FFTW (fastest) or pure-Rust RealFFT
- Streaming Support: Frame-by-frame processing for real-time applications
- Type-Safe Rust API: Compile-time guarantees for spectrogram types
- Python Bindings: Fast computation with NumPy integration and GIL-free execution
Why Choose Spectrograms?
- Cross-Language: Use from Rust or Python with consistent APIs
- High Performance: Rust implementation, Python bindings with minimal overhead
- Not Limited to One Type: Multiple frequency scales in a unified API
- Production Ready: Efficient batch processing and streaming support
- Well Documented: Comprehensive integration guide, examples, and API docs
Installation
[]
= "0.1"
For pure-Rust FFT (no system dependencies):
[]
= {
version = "0.1",
= false,
= ["realfft"]
}
For FFTW-accelerated version (requires system FFTW library):
Quick Start
Generate a Test Signal
use PI;
// 1 second of 440 Hz sine wave
let sample_rate = 16000.0;
let samples: =
.map
.collect;
# 1 second of 440 Hz sine wave
= 16000
=
=
Compute a Basic Spectrogram
use *;
// Configure parameters
let stft = new?;
let params = new?;
// Compute power spectrogram
let spec = compute?;
println!;
# Configure parameters
=
=
# Compute power spectrogram
=
Mel Spectrogram Example
use *;
let stft = new?;
let params = new?;
// Mel filterbank
let mel = new?;
// dB scaling
let db = new?;
// Compute mel spectrogram in dB
let spec = compute?;
// Access data
println!;
println!;
println!;
=
=
# Mel filterbank
=
# dB scaling
=
# Compute mel spectrogram in dB
=
# Access data
Efficient Batch Processing
Reuse FFT plans for 2-10x speedup when processing multiple signals:
use *;
let signals = vec!;
let stft = new?;
let params = new?;
let mel = new?;
let db = new?;
// Create plan once
let planner = new;
let mut plan = planner.mel_db_plan?;
// Reuse for all signals (much faster!)
for signal in signals
=
=
=
=
=
# Create plan once
=
=
# Reuse for all signals (much faster!)
=
# Process spec...
Advanced Features
MFCCs (Mel-Frequency Cepstral Coefficients)
use *;
let stft = new?;
let mfcc_params = new?;
let mfccs = compute_mfcc?;
// Shape: (13, n_frames)
println!;
=
=
=
# Shape: (13, n_frames)
Chromagram (Pitch Class Profiles)
use *;
let stft = new?;
let chroma_params = music_standard;
let chroma = compute_chromagram?;
// Shape: (12, n_frames) - one row per pitch class
println!;
=
=
=
# Shape: (12, n_frames)
Supported Spectrogram Types
Frequency Scales
- Linear (
LinearHz): Standard FFT bins, evenly spaced in Hz - Mel (
Mel): Mel-frequency scale, perceptually motivated for speech/audio - ERB (
Erb): Equivalent Rectangular Bandwidth, models auditory perception - CQT: Constant-Q Transform for music analysis
- Log (
LogHz): Logarithmic frequency spacing
Amplitude Scales
| Scale | Formula | Use Case |
|---|---|---|
| Power | |X|² |
Energy analysis, ML features |
| Magnitude | |X| |
Spectral analysis, phase vocoder |
| Decibels | 10·log₁₀(power) |
Visualization, perceptual analysis |
Type Aliases (Rust)
// Linear frequency
type LinearPowerSpectrogram = ;
type LinearMagnitudeSpectrogram = ;
type LinearDbSpectrogram = ;
// Mel frequency
type MelPowerSpectrogram = ;
type MelMagnitudeSpectrogram = ;
type MelDbSpectrogram = ;
// ERB frequency
type ErbPowerSpectrogram = ;
type ErbMagnitudeSpectrogram = ;
type ErbDbSpectrogram = ;
Window Functions
Supported window functions with different frequency/time resolution trade-offs:
rectangular: No windowing (best frequency resolution, high leakage)hanning: Good general-purpose window (default)hamming: Similar to Hanning with different coefficientsblackman: Low sidelobes, wider main lobebartlett: Triangular windowkaiser=<beta>: Tunable trade-off (β controls shape, e.g.,kaiser=5.0)gaussian=<std>: Smooth roll-off (e.g.,gaussian=0.4)
// Parse from string
let window: WindowType = "hanning".parse?;
let kaiser: WindowType = "kaiser=8.0".parse?;
// Or use constructors
let hann = Hanning;
let gauss = Gaussian ;
# Use class methods
=
=
=
# Or from string
=
Default Presets
// Speech processing preset
// n_fft=512, hop_size=160
let params = speech_default?;
// Music processing preset
// n_fft=2048, hop_size=512
let params = music_default?;
# Speech processing preset
=
# Music processing preset
=
Accessing Results
let spec = compute?;
// Dimensions
let n_bins = spec.n_bins;
let n_frames = spec.n_frames;
// Data (ndarray::Array2<f64>)
let data = spec.data;
// Axes
let freqs = spec.axes.frequencies;
let times = spec.axes.times;
let = spec.axes.frequency_range;
let duration = spec.axes.duration;
// Original parameters
let params = spec.params;
=
# Dimensions
=
=
# Data (numpy array)
= # shape: (n_bins, n_frames)
# Axes
=
=
, =
=
# Original parameters
=
Examples
Comprehensive examples in both languages:
Rust (examples/):
basic_linear.rs- Simple linear spectrogrammel_spectrogram.rs- Mel spectrogram with dB scalingreuse_plan.rs- Batch processing with plan reusecompare_windows.rs- Window function comparisonamplitude_scales.rs- Power, Magnitude, and dB
Python (python/examples/):
basic_linear.py- Linear spectrogram basicsmel_spectrogram.py- Mel spectrogramsmfcc_example.py- MFCC computationchromagram_example.py- Pitch class profilesbatch_processing.py- Efficient batch processingstreaming.py- Real-time frame-by-frame processing
Documentation
- Integration Guide: Comprehensive walkthrough with side-by-side Rust/Python examples
- API Documentation: Full Rust API reference
- Python Documentation: Python API reference and guides
- Contributing Guide: How to contribute to the project
Feature Flags (Rust)
The Rust library requires exactly one FFT backend:
-
fftw: Uses FFTW for FFT computation- Fastest performance
- Requires system FFTW library (
libfftw3-devon Ubuntu/Debian) - Not pure Rust
-
realfft(default): Pure-Rust FFT implementation- No system dependencies
- Slightly slower than FFTW
- Works everywhere
Additional flags:
python(default): Enables Python bindingsserde: Enables serialization support
# Pure Rust, no Python
[]
= { = "0.1", = false, = ["realfft"] }
# FFTW backend with Python
[]
= { = "0.1", = false, = ["fftw", "python"] }
Performance Tips
- Reuse plans: Use
SpectrogramPlannerfor 2-10x speedup on batch processing - Choose power-of-2 FFT sizes: Best performance (512, 1024, 2048, 4096)
- Use FFTW backend: Maximum speed when system dependencies are acceptable
- Python GIL: All compute functions release the GIL for parallelism
- Streaming: Use frame-by-frame processing for real-time applications
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Citation
If you use this library in academic work, please cite:
Note: This library focuses on spectrogram computation. For complete audio analysis pipelines, combine it with audio I/O libraries like audio_samples and your preferred plotting tools.