kofft
High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.
Features
- 🚀 Zero-allocation stack-only APIs for MCU/embedded systems
- ⚡ SIMD acceleration (x86_64 AVX2, AArch64 NEON, WebAssembly SIMD)
- 🔧 Multiple transform types: FFT, DCT, DST, Hartley, Wavelet, STFT, CZT, Goertzel
- 📊 Window functions: Hann, Hamming, Blackman, Kaiser
- 🔄 Batch and multi-channel processing
- 🌐 WebAssembly support
- 📱 Parallel processing (optional)
Quick Start
Add to Cargo.toml
[]
= { = "0.1.1", = [
# "x86_64", # enable AVX2 on x86_64
# "aarch64", # enable NEON on AArch64
# "wasm", # enable WebAssembly SIMD
# "parallel", # enable Rayon-based parallel helpers
] }
Basic Usage
For an overview of the Fast Fourier Transform (FFT), see Wikipedia.
use ;
use ;
// Create FFT instance with planner (caches twiddle factors)
let planner = new;
let fft = with_planner;
// Prepare data
let mut data = vec!;
// Compute FFT
fft.fft?;
// Compute inverse FFT
fft.ifft?;
Embedded/MCU Usage (No Heap)
All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.
FFT (Stack-Only)
use ;
// 8-point FFT (power-of-two only for stack APIs)
let mut buf: = ;
fft_inplace_stack?;
DCT-II (Stack-Only)
use dct2_inplace_stack;
let input: = ;
let mut output: = ;
dct2_inplace_stack;
DST-II (Stack-Only)
use dst2_inplace_stack;
let input: = ;
let mut output: = ;
dst2_inplace_stack;
Haar Wavelet (Stack-Only)
use ;
// Forward transform
let input: = ;
let mut avg = ;
let mut diff = ;
haar_forward_inplace_stack;
// Inverse transform
let mut out = ;
haar_inverse_inplace_stack;
Window Functions (Stack-Only)
use ;
let mut hann: = ;
hann_inplace_stack;
let mut hamming: = ;
hamming_inplace_stack;
let mut blackman: = ;
blackman_inplace_stack;
Desktop/Standard Library Usage
With the std feature (enabled by default), you get heap-based APIs for more flexibility.
FFT with Standard Library
use ;
let fft = default;
// Heap-based FFT
let mut data = vec!;
fft.fft?;
// Or create new vector
let result = fft.fft_vec?;
Real FFT (Optimized for Real Input)
use ;
let fft = default;
let input = vec!;
let mut output = vec!;
fft.rfft?;
STFT (Short-Time Fourier Transform)
For background on STFT, see Wikipedia.
use ;
use hann;
let signal = vec!;
let window = hann;
let hop_size = 2;
let mut frames = vec!;
stft?;
let mut output = vec!;
istft?;
Batch Processing
use ;
let fft = default;
let mut batches = vec!;
fft.batch?;
Examples
Run the included examples with:
Advanced Features
Enable optional features in Cargo.toml:
[]
= { = "0.1.1", = [
# "x86_64", # AVX2 on x86_64
# "aarch64", # NEON on AArch64
# "wasm", # WebAssembly SIMD
# "parallel", # Rayon-based parallel helpers
] }
SIMD Acceleration
Enable one of the CPU-specific SIMD features above for better performance.
Parallel Processing
Enable the parallel feature (using Rayon) as shown above:
use parallel;
let signal = vec!;
let window = vec!;
let hop_size = 2;
let mut frames = vec!;
parallel?;
Additional Transforms
- DCT – Discrete Cosine Transform (Wikipedia)
- DST – Discrete Sine Transform (Wikipedia)
- Hartley Transform – (Wikipedia)
- Wavelet Transform – (Wikipedia)
- Goertzel Algorithm – (Wikipedia)
- Chirp Z-Transform – (Wikipedia)
- Hilbert Transform – (Wikipedia)
- Cepstrum – (Wikipedia)
use ;
// DCT variants
let dct2_result = dct2;
let dct3_result = dct3;
let dct4_result = dct4;
// DST variants
let dst1_result = dst1;
let dst2_result = dst2;
let dst3_result = dst3;
// Hartley Transform
let hartley_result = dht;
// Wavelet Transform
let = haar_forward;
let reconstructed = haar_inverse;
// Goertzel Algorithm (single frequency detection)
let magnitude = goertzel_f32;
// Chirp Z-Transform
let czt_result = czt_f32;
// Hilbert Transform
let hilbert_result = hilbert_analytic;
// Cepstrum
let cepstrum_result = real_cepstrum;
Complete MCU Example
use ;
use dct2_inplace_stack;
use hann_inplace_stack;
!
Performance Notes
- Stack-only APIs: No heap allocation, suitable for MCUs with limited RAM
- SIMD acceleration: 2-4x speedup on supported platforms
- Power-of-two sizes: Most efficient for FFT operations
- Memory usage: Stack usage scales with transform size (e.g., 8-point FFT uses ~64 bytes for
Complex32)
Platform Support
| Platform | SIMD Support | Features |
|---|---|---|
| x86_64 | AVX2/FMA | x86_64 feature |
| AArch64 | NEON | aarch64 feature |
| WebAssembly | SIMD128 | wasm feature |
| Generic | Scalar | Default fallback |
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.