kofft
High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.
Features
- ๐ Zero-allocation stack-only APIs for MCU/embedded systems
- โก SIMD acceleration (x86_64 AVX2 & SSE, AArch64 NEON, WebAssembly SIMD)
- ๐งฎ Radix-4 and mixed-radix FFTs for power-of-two and composite sizes
- ๐ง Multiple transform types: FFT, DCT (Types I-IV), DST (Types I-IV), Hartley, Wavelet, STFT, CZT, Goertzel
- ๐ Window functions: Hann, Hamming, Blackman, Kaiser
- ๐ Batch and multi-channel processing
- ๐ WebAssembly support
- ๐ฑ Parallel processing (optional)
Benchmarks
Latest benchmarks on an Intel Xeon Platinum 8370C show:
- 1024-point complex FFT: ~81 ยตs
- 4096-point complex FFT: ~1.0 ms
- 1,048,576-point real FFT: ~67 ms
See benchmarks/latest.json for full results.
Quick Start
Add to Cargo.toml
[]
= { = "0.1.4", = [
# "x86_64", # enable AVX2 on x86_64
# "sse", # enable SSE on x86_64 without AVX2
# "aarch64", # enable NEON on AArch64
# "wasm", # enable WebAssembly SIMD
# "parallel", # enable Rayon-based parallel helpers
] }
Basic Usage
For an overview of the Fast Fourier Transform (FFT), see Wikipedia.
use ;
use ;
// Create FFT instance with planner (caches twiddle factors)
let planner = new;
let fft = with_planner;
// Prepare data
let mut data = vec!;
// Compute FFT
fft.fft?;
// Compute inverse FFT
fft.ifft?;
Parallel FFT
Enable the parallel feature to automatically split large transforms across
threads via Rayon. Use the fft_parallel and
ifft_parallel helpers which safely fall back to single-threaded execution when
Rayon is not available.
use ;
let mut data = vec!;
fft_parallel?;
ifft_parallel?;
Embedded/MCU Usage (No Heap)
All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.
FFT (Stack-Only)
use ;
// 8-point FFT (power-of-two only for stack APIs)
let mut buf: = ;
fft_inplace_stack?;
DCT-I (Stack-Only)
use dct1_inplace_stack;
let input: = ;
let mut output: = ;
dct1_inplace_stack;
DCT-II (Stack-Only)
use dct2_inplace_stack;
let input: = ;
let mut output: = ;
dct2_inplace_stack;
DST-II (Stack-Only)
use dst2_inplace_stack;
let input: = ;
let mut output: = ;
dst2_inplace_stack;
DST-IV (Stack-Only)
use dst4_inplace_stack;
let input: = ;
let mut output: = ;
dst4_inplace_stack;
Haar Wavelet (Stack-Only)
use ;
// Forward transform
let input: = ;
let mut avg = ;
let mut diff = ;
haar_forward_inplace_stack;
// Inverse transform
let mut out = ;
haar_inverse_inplace_stack;
Window Functions (Stack-Only)
use ;
let mut hann: = ;
hann_inplace_stack;
let mut hamming: = ;
hamming_inplace_stack;
let mut blackman: = ;
blackman_inplace_stack;
Desktop/Standard Library Usage
With the std feature (enabled by default), you get heap-based APIs for more flexibility.
FFT with Standard Library
use ;
let fft = default;
// Heap-based FFT
let mut data = vec!;
fft.fft?;
// Or create new vector
let result = fft.fft_vec?;
Real FFT (Optimized for Real Input)
use ;
use RealFftImpl;
let fft = default;
let mut input = vec!;
let mut output = vec!;
fft.rfft?;
Stack-only helpers avoid heap allocation:
use ;
use Complex32;
let input = ;
let mut freq = ;
rfft_stack?;
let mut time = ;
irfft_stack?;
STFT (Short-Time Fourier Transform)
For background on STFT, see Wikipedia.
use ;
use hann;
let signal = vec!;
let window = hann;
let hop_size = 2;
let mut frames = vec!;
stft?;
let mut output = vec!;
istft?;
Streaming STFT/ISTFT
use ;
use hann;
use Complex32;
let signal = vec!;
let window = hann;
let hop_size = 2;
let mut stream = new?;
let mut frames = Vecnew;
let mut frame = vec!;
while stream.next_frame?
let mut output = vec!;
istft?;
Batch Processing
use ;
let fft = default;
let mut batches = vec!;
fft.batch?;
Examples
Run the included examples with:
Advanced Features
Enable optional features in Cargo.toml:
[]
= { = "0.1.4", = [
# "x86_64", # AVX2 on x86_64
# "aarch64", # NEON on AArch64
# "wasm", # WebAssembly SIMD
# "parallel", # Rayon-based parallel helpers
] }
SIMD Acceleration
Enable one of the CPU-specific SIMD features above for better performance.
SIMD backends are also enabled automatically when compiling with the
appropriate target-feature flags (e.g., RUSTFLAGS="-C target-feature=+avx2").
Parallel Processing
Enable the parallel feature (using Rayon) as shown above:
use parallel;
let signal = vec!;
let window = vec!;
let hop_size = 2;
let mut frames = vec!;
parallel?;
Additional Transforms
- DCT โ Discrete Cosine Transform (Wikipedia)
- DST โ Discrete Sine Transform (Wikipedia)
- Hartley Transform โ (Wikipedia)
- Wavelet Transform โ multi-level Haar, Daubechies, Symlets, Coiflets (Wikipedia)
- Goertzel Algorithm โ (Wikipedia)
- Chirp Z-Transform โ (Wikipedia)
- Hilbert Transform โ (Wikipedia)
- Cepstrum โ (Wikipedia)
use ;
// DCT variants
let dct2_result = dct2;
let dct3_result = dct3;
let dct4_result = dct4;
// DST variants
let dst1_result = dst1;
let dst2_result = dst2;
let dst3_result = dst3;
// Hartley Transform
let hartley_result = dht;
// Wavelet Transform
use ;
let = haar_forward_multi;
let reconstructed = haar_inverse_multi;
// Additional families, e.g. Daubechies-4
let = db4_forward_multi;
let db4_recon = db4_inverse_multi;
// Goertzel Algorithm (single frequency detection)
let magnitude = goertzel_f32;
// Chirp Z-Transform
let czt_result = czt_f32;
// Hilbert Transform
let hilbert_result = hilbert_analytic;
// Cepstrum
let cepstrum_result = real_cepstrum;
Complete MCU Example
use ;
use dct2_inplace_stack;
use hann_inplace_stack;
!
Performance Notes
- Stack-only APIs: No heap allocation, suitable for MCUs with limited RAM
- SIMD acceleration: 2-4x speedup on supported platforms
- Parallel FFT: Enable the
parallelfeature to scale across CPU cores - Power-of-two sizes: Most efficient for FFT operations
- Memory usage: Stack usage scales with transform size (e.g., 8-point FFT uses ~64 bytes for
Complex32)
Platform Support
| Platform | SIMD Support | Enable via |
|---|---|---|
| x86_64 | AVX2/FMA | x86_64 feature or -C target-feature=+avx2 |
| x86_64 (SSE) | SSE2 | sse feature or default sse2 target |
| AArch64 | NEON | aarch64 feature or -C target-feature=+neon |
| WebAssembly | SIMD128 | wasm feature or -C target-feature=+simd128 |
| Generic | Scalar | Default fallback |
Feature selection precedence: x86_64 (AVX2) โ sse โ scalar fallback.
Benchmark
Last run: 2025-08-14T12:04:00.734481659+00:00 on local (Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz; rustc 1.87.0 (17067e9ac 2025-05-09); flags: ``)
| Library | Transform | Size (N) | Mode | Time/op | Ops/sec | Allocations | Date/Runner |
|---|---|---|---|---|---|---|---|
| kofft | Complex | 1024 | Single | 0.081 ms | 12334.56 | 3 | 2025-08-14 local |
| kofft | Complex | 1024 | Parallel | 0.040 ms | 24846.57 | 3 | 2025-08-14 local |
| rustfft | Complex | 1024 | Single | 0.026 ms | 38292.17 | 1 | 2025-08-14 local |
| kofft | Real | 1024 | Single | 0.060 ms | 16627.59 | 4 | 2025-08-14 local |
| realfft | Real | 1024 | Single | 0.004 ms | 258933.20 | 0 | 2025-08-14 local |
| kofft | Complex | 2048 | Single | 0.075 ms | 13320.37 | 3 | 2025-08-14 local |
| kofft | Complex | 2048 | Parallel | 0.085 ms | 11782.59 | 3 | 2025-08-14 local |
| rustfft | Complex | 2048 | Single | 0.009 ms | 112917.80 | 1 | 2025-08-14 local |
| kofft | Real | 2048 | Single | 0.098 ms | 10237.41 | 4 | 2025-08-14 local |
| realfft | Real | 2048 | Single | 0.007 ms | 134102.19 | 0 | 2025-08-14 local |
| kofft | Complex | 4096 | Single | 1.046 ms | 955.65 | 3 | 2025-08-14 local |
| kofft | Complex | 4096 | Parallel | 2.799 ms | 357.33 | 4 | 2025-08-14 local |
| rustfft | Complex | 4096 | Single | 0.024 ms | 41382.16 | 1 | 2025-08-14 local |
| kofft | Real | 4096 | Single | 1.171 ms | 853.74 | 4 | 2025-08-14 local |
| realfft | Real | 4096 | Single | 0.015 ms | 64951.94 | 0 | 2025-08-14 local |
| kofft | Complex | 8192 | Single | 2.378 ms | 420.60 | 3 | 2025-08-14 local |
| kofft | Complex | 8192 | Parallel | 2.194 ms | 455.88 | 3 | 2025-08-14 local |
| rustfft | Complex | 8192 | Single | 0.036 ms | 28052.85 | 1 | 2025-08-14 local |
| kofft | Real | 8192 | Single | 1.225 ms | 816.26 | 4 | 2025-08-14 local |
| realfft | Real | 8192 | Single | 0.031 ms | 32217.53 | 0 | 2025-08-14 local |
| kofft | Complex | 16384 | Single | 3.120 ms | 320.51 | 3 | 2025-08-14 local |
| kofft | Complex | 16384 | Parallel | 3.706 ms | 269.86 | 3 | 2025-08-14 local |
| rustfft | Complex | 16384 | Single | 0.049 ms | 20264.66 | 1 | 2025-08-14 local |
| kofft | Real | 16384 | Single | 2.298 ms | 435.16 | 4 | 2025-08-14 local |
| realfft | Real | 16384 | Single | 0.027 ms | 36964.48 | 0 | 2025-08-14 local |
| kofft | Complex | 32768 | Single | 3.654 ms | 273.68 | 3 | 2025-08-14 local |
| kofft | Complex | 32768 | Parallel | 4.138 ms | 241.63 | 3 | 2025-08-14 local |
| rustfft | Complex | 32768 | Single | 0.105 ms | 9564.16 | 1 | 2025-08-14 local |
| kofft | Real | 32768 | Single | 3.871 ms | 258.33 | 4 | 2025-08-14 local |
| realfft | Real | 32768 | Single | 0.076 ms | 13146.31 | 0 | 2025-08-14 local |
| kofft | Complex | 65536 | Single | 4.232 ms | 236.32 | 3 | 2025-08-14 local |
| kofft | Complex | 65536 | Parallel | 2.834 ms | 352.88 | 3 | 2025-08-14 local |
| rustfft | Complex | 65536 | Single | 0.249 ms | 4015.79 | 1 | 2025-08-14 local |
| kofft | Real | 65536 | Single | 2.854 ms | 350.41 | 4 | 2025-08-14 local |
| realfft | Real | 65536 | Single | 0.140 ms | 7142.30 | 0 | 2025-08-14 local |
| kofft | Complex | 131072 | Single | 5.810 ms | 172.11 | 4 | 2025-08-14 local |
| kofft | Complex | 131072 | Parallel | 4.891 ms | 204.47 | 3 | 2025-08-14 local |
| rustfft | Complex | 131072 | Single | 1.374 ms | 728.06 | 1 | 2025-08-14 local |
| kofft | Real | 131072 | Single | 6.192 ms | 161.51 | 5 | 2025-08-14 local |
| realfft | Real | 131072 | Single | 0.554 ms | 1806.64 | 0 | 2025-08-14 local |
| kofft | Complex | 262144 | Single | 15.280 ms | 65.45 | 3 | 2025-08-14 local |
| kofft | Complex | 262144 | Parallel | 9.065 ms | 110.32 | 3 | 2025-08-14 local |
| rustfft | Complex | 262144 | Single | 7.525 ms | 132.88 | 1 | 2025-08-14 local |
| kofft | Real | 262144 | Single | 21.934 ms | 45.59 | 4 | 2025-08-14 local |
| realfft | Real | 262144 | Single | 0.733 ms | 1364.44 | 0 | 2025-08-14 local |
| kofft | Complex | 524288 | Single | 29.240 ms | 34.20 | 3 | 2025-08-14 local |
| kofft | Complex | 524288 | Parallel | 29.209 ms | 34.24 | 3 | 2025-08-14 local |
| rustfft | Complex | 524288 | Single | 14.538 ms | 68.78 | 1 | 2025-08-14 local |
| kofft | Real | 524288 | Single | 37.093 ms | 26.96 | 4 | 2025-08-14 local |
| realfft | Real | 524288 | Single | 1.982 ms | 504.63 | 0 | 2025-08-14 local |
| kofft | Complex | 1048576 | Single | 59.265 ms | 16.87 | 3 | 2025-08-14 local |
| kofft | Complex | 1048576 | Parallel | 60.507 ms | 16.53 | 4 | 2025-08-14 local |
| rustfft | Complex | 1048576 | Single | 30.361 ms | 32.94 | 1 | 2025-08-14 local |
| kofft | Real | 1048576 | Single | 66.946 ms | 14.94 | 4 | 2025-08-14 local |
| realfft | Real | 1048576 | Single | 4.361 ms | 229.31 | 0 | 2025-08-14 local |
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.