kofft

High-performance, no_std, MCU-friendly DSP library featuring FFT, DCT, DST, Hartley, Wavelet, STFT, and more. Stack-only, SIMD-optimized, and batch transforms for embedded and scientific Rust applications.

Features

🚀 Zero-allocation stack-only APIs for MCU/embedded systems
⚡ SIMD acceleration (x86_64 AVX2, AArch64 NEON, WebAssembly SIMD)
🔧 Multiple transform types: FFT, DCT, DST, Hartley, Wavelet, STFT, CZT, Goertzel
📊 Window functions: Hann, Hamming, Blackman, Kaiser
🔄 Batch and multi-channel processing
🌐 WebAssembly support
📱 Parallel processing (optional)

Quick Start

Add to Cargo.toml

[dependencies]
kofft = "0.1.0"

# For SIMD acceleration (optional)
kofft = { version = "0.1.0", features = ["x86_64"] }  # x86_64 AVX2
kofft = { version = "0.1.0", features = ["aarch64"] } # AArch64 NEON
kofft = { version = "0.1.0", features = ["wasm"] }    # WebAssembly SIMD

# For parallel processing
kofft = { version = "0.1.0", features = ["parallel"] }

Basic Usage

use kofft::fft::{Complex32, ScalarFftImpl, FftImpl};

// Create FFT instance
let fft = ScalarFftImpl::<f32>::default();

// Prepare data
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

// Compute FFT
fft.fft(&mut data)?;

// Compute inverse FFT
fft.ifft(&mut data)?;

Embedded/MCU Usage (No Heap)

All stack-only APIs require you to provide output buffers. This enables no_std operation without any heap allocation.

FFT (Stack-Only)

use kofft::fft::{Complex32, fft_inplace_stack};

// 8-point FFT (power-of-two only for stack APIs)
let mut buf: [Complex32; 8] = [
    Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0),
    Complex32::new(5.0, 0.0), Complex32::new(6.0, 0.0),
    Complex32::new(7.0, 0.0), Complex32::new(8.0, 0.0),
];

fft_inplace_stack(&mut buf)?;

DCT-II (Stack-Only)

use kofft::dct::dct2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dct2_inplace_stack(&input, &mut output);

DST-II (Stack-Only)

use kofft::dst::dst2_inplace_stack;

let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output: [f32; 8] = [0.0; 8];

dst2_inplace_stack(&input, &mut output);

Haar Wavelet (Stack-Only)

use kofft::wavelet::{haar_forward_inplace_stack, haar_inverse_inplace_stack};

// Forward transform
let input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut avg = [0.0; 4];
let mut diff = [0.0; 4];

haar_forward_inplace_stack(&input, &mut avg[..], &mut diff[..]);

// Inverse transform
let mut out = [0.0; 8];
haar_inverse_inplace_stack(&avg[..], &diff[..], &mut out[..]);

Window Functions (Stack-Only)

use kofft::window::{hann_inplace_stack, hamming_inplace_stack, blackman_inplace_stack};

let mut hann: [f32; 8] = [0.0; 8];
hann_inplace_stack(&mut hann);

let mut hamming: [f32; 8] = [0.0; 8];
hamming_inplace_stack(&mut hamming);

let mut blackman: [f32; 8] = [0.0; 8];
blackman_inplace_stack(&mut blackman);

Desktop/Standard Library Usage

With the std feature (enabled by default), you get heap-based APIs for more flexibility.

FFT with Standard Library

use kofft::fft::{Complex32, ScalarFftImpl, FftImpl};

let fft = ScalarFftImpl::<f32>::default();

// Heap-based FFT
let mut data = vec![
    Complex32::new(1.0, 0.0),
    Complex32::new(2.0, 0.0),
    Complex32::new(3.0, 0.0),
    Complex32::new(4.0, 0.0),
];

fft.fft(&mut data)?;

// Or create new vector
let result = fft.fft_vec(&data)?;

Real FFT (Optimized for Real Input)

use kofft::fft::{ScalarFftImpl, FftImpl};

let fft = ScalarFftImpl::<f32>::default();
let input = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let mut output = vec![Complex32::zero(); input.len() / 2 + 1];

fft.rfft(&input, &mut output)?;

STFT (Short-Time Fourier Transform)

use kofft::stft::{stft, istft};
use kofft::window::hann;

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = hann(4);
let hop_size = 2;

let mut frames = vec![vec![]; (signal.len() + hop_size - 1) / hop_size];
stft(&signal, &window, hop_size, &mut frames)?;

let mut output = vec![0.0; signal.len()];
istft(&frames, &window, hop_size, &mut output)?;

Batch Processing

use kofft::fft::{ScalarFftImpl, FftImpl};

let fft = ScalarFftImpl::<f32>::default();
let mut batches = vec![
    vec![Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0)],
    vec![Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0)],
];

fft.batch(&mut batches)?;

Advanced Features

SIMD Acceleration

Enable platform-specific SIMD features for better performance:

# x86_64 with AVX2
kofft = { version = "0.1.0", features = ["x86_64"] }

# AArch64 with NEON
kofft = { version = "0.1.0", features = ["aarch64"] }

# WebAssembly with SIMD
kofft = { version = "0.1.0", features = ["wasm"] }

Parallel Processing

kofft = { version = "0.1.0", features = ["parallel"] }

use kofft::stft::parallel;

let signal = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let window = vec![1.0, 1.0, 1.0, 1.0];
let hop_size = 2;

let mut frames = vec![vec![]; (signal.len() + hop_size - 1) / hop_size];
parallel(&signal, &window, hop_size, &mut frames)?;

Additional Transforms

use kofft::{dct, dst, hartley, wavelet, goertzel, czt, hilbert, cepstrum};

// DCT variants
let dct2_result = dct::dct2(&input);
let dct3_result = dct::dct3(&input);
let dct4_result = dct::dct4(&input);

// DST variants
let dst2_result = dst::dst2(&input);
let dst3_result = dst::dst3(&input);
let dst4_result = dst::dst4(&input);

// Hartley Transform
let hartley_result = hartley::dht(&input);

// Wavelet Transform
let (avg, diff) = wavelet::haar_forward(&input);
let reconstructed = wavelet::haar_inverse(&avg, &diff);

// Goertzel Algorithm (single frequency detection)
let magnitude = goertzel::goertzel_f32(&input, 44100.0, 1000.0);

// Chirp Z-Transform
let czt_result = czt::czt_f32(&input, 64, (0.5, 0.0), (1.0, 0.0));

// Hilbert Transform
let hilbert_result = hilbert::hilbert(&input);

// Cepstrum
let cepstrum_result = cepstrum::real_cepstrum(&input);

Complete MCU Example

#![no_std]
use kofft::fft::{Complex32, fft_inplace_stack};
use kofft::dct::dct2_inplace_stack;
use kofft::window::hann_inplace_stack;

#[entry]
fn main() -> ! {
    // FFT example
    let mut fft_buf: [Complex32; 8] = [
        Complex32::new(1.0, 0.0), Complex32::new(2.0, 0.0),
        Complex32::new(3.0, 0.0), Complex32::new(4.0, 0.0),
        Complex32::new(5.0, 0.0), Complex32::new(6.0, 0.0),
        Complex32::new(7.0, 0.0), Complex32::new(8.0, 0.0),
    ];
    fft_inplace_stack(&mut fft_buf).unwrap();

    // DCT example
    let dct_input: [f32; 8] = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
    let mut dct_output: [f32; 8] = [0.0; 8];
    dct2_inplace_stack(&dct_input, &mut dct_output);

    // Window function example
    let mut window: [f32; 8] = [0.0; 8];
    hann_inplace_stack(&mut window);

    loop {
        // Your application logic here
    }
}

Performance Notes

Stack-only APIs: No heap allocation, suitable for MCUs with limited RAM
SIMD acceleration: 2-4x speedup on supported platforms
Power-of-two sizes: Most efficient for FFT operations
Memory usage: Stack usage scales with transform size (e.g., 8-point FFT uses ~64 bytes for Complex32)

Platform Support

Platform	SIMD Support	Features
x86_64	AVX2/FMA	`x86_64` feature
AArch64	NEON	`aarch64` feature
WebAssembly	SIMD128	`wasm` feature
Generic	Scalar	Default fallback

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

at your option.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

kofft 0.1.0