circulant-rs 0.2.0

High-performance block-circulant matrix operations using FFT
Documentation

circulant-rs

Version: 0.1.0 | Updated: 2024-01-26 | Reading time: 5 min

A high-performance Rust library for block-circulant matrix operations and quantum walk simulations.

License Rust


⚡ 676× Faster

FFT-based O(N log N) vs naive O(N²)

Scales to 50,000× at N=1M

💾 99.9999% Less Memory

N=1M: Dense 7.5 TB → circulant-rs 8 MB

From impossible to trivial

🔬 1000× Larger Simulations

Quantum walks with N=1,000,000 positions

Previously limited to N≈1,000


Overview

circulant-rs exploits the mathematical structure of circulant matrices to achieve O(N log N) complexity instead of O(N²) for matrix-vector multiplication. This is accomplished through the FFT diagonalization property: every circulant matrix can be diagonalized by the Discrete Fourier Transform.

The library provides:

  • 1D Circulant matrices - For signal processing, convolution, and 1D quantum walks
  • 2D Block Circulant (BCCB) matrices - For image processing and 2D periodic systems
  • Quantum Walk simulation - Discrete-time coined quantum walks with FFT-accelerated evolution
  • Full serialization support - Save and restore quantum states and walk configurations

Installation

Add to your Cargo.toml:

[dependencies]
circulant-rs = "0.1"

Feature Flags

Feature Default Description
std Yes Standard library support
physics Yes Quantum walk simulation module
parallel Yes Rayon-based parallelization
serde Yes Serialization with serde/bincode

Quick Start

1D Circulant Matrix Multiplication

use circulant_rs::prelude::*;
use num_complex::Complex;

fn main() -> Result<(), CirculantError> {
    // Create a circulant matrix from its first row
    let generator = vec![1.0, 2.0, 3.0, 4.0];
    let matrix = Circulant::from_real(generator)?;

    // Multiply by a vector - O(N log N) via FFT!
    let x = vec![
        Complex::new(1.0, 0.0),
        Complex::new(0.0, 0.0),
        Complex::new(1.0, 0.0),
        Complex::new(0.0, 0.0),
    ];
    let result = matrix.mul_vec(&x)?;

    println!("Result: {:?}", result);
    Ok(())
}

Quantum Walk Simulation

use circulant_rs::physics::{CoinedWalk1D, Coin, QuantumState, QuantumWalk};

fn main() {
    // Create a quantum walk on a 256-node ring with Hadamard coin
    let walk = CoinedWalk1D::<f64>::new(256, Coin::Hadamard);

    // Initialize walker at position 128
    let initial_state = QuantumState::localized(128, 256, 2).unwrap();

    // Simulate 100 steps
    let final_state = walk.simulate(initial_state, 100);

    // Get probability distribution
    let probabilities = final_state.position_probabilities();

    // Quantum walks show ballistic spreading (linear in time)
    // compared to classical random walks (sqrt of time)
    println!("Norm preserved: {}", final_state.norm_squared());
}

2D Block Circulant Operations

use circulant_rs::core::BlockCirculant;
use ndarray::Array2;
use num_complex::Complex;

fn main() {
    // Create a 2D convolution kernel
    let kernel = Array2::from_shape_vec((3, 3), vec![
        Complex::new(1.0, 0.0), Complex::new(2.0, 0.0), Complex::new(1.0, 0.0),
        Complex::new(2.0, 0.0), Complex::new(4.0, 0.0), Complex::new(2.0, 0.0),
        Complex::new(1.0, 0.0), Complex::new(2.0, 0.0), Complex::new(1.0, 0.0),
    ]).unwrap();

    // Create BCCB filter (zero-padded to output size)
    let filter = BlockCirculant::from_kernel(kernel, 64, 64).unwrap();

    // Apply to 2D data - O(N log N) via 2D FFT!
    let input = Array2::zeros((64, 64));
    let output = filter.mul_array(&input).unwrap();
}

Performance

Measured Benchmarks: FFT vs Naive O(N²)

Size N circulant-rs (FFT) Naive O(N²) Speedup
64 573 ns 18.8 µs 33×
256 3.2 µs 335 µs 105×
1,024 12.3 µs 4.76 ms 387×
2,048 28.4 µs 19.2 ms 676×
65,536 ~800 µs ~68 min* ~5,000×
1,048,576 ~15 ms ~17 days* ~50,000×

*Extrapolated from measured scaling

Quantum Walk Simulation (100 steps, Hadamard coin)

Positions Total Time Per Step Throughput
256 578 µs 5.8 µs 44M pos/sec
1,024 2.44 ms 24.4 µs 42M pos/sec
4,096 11.9 ms 119 µs 34M pos/sec
16,384 71.3 ms 713 µs 23M pos/sec
65,536 505 ms 5.05 ms 13M pos/sec

Memory Scaling

Size N Generator Only Dense Matrix Reduction
1,000 16 KB 16 MB 1,000×
10,000 160 KB 1.6 GB 10,000×
100,000 1.6 MB 160 GB 100,000×
1,000,000 16 MB 16 TB 1,000,000×

Use Cases

Domain Application Why circulant-rs Helps
Quantum Computing Quantum walk simulations on rings Simulate N=1M positions vs N=1K with dense matrices
Signal Processing Real-time periodic FIR filtering O(N log N) convolution at audio sample rates
Image Processing 2D periodic convolution (blur, edge detection) 4K image filtering in milliseconds
Cryptography Lattice-based schemes (NTRU) Efficient polynomial multiplication
Machine Learning Circulant neural network layers Reduced parameter count with structured weights

See docs/OVERVIEW.md for detailed use case examples with code.

Mathematical Background

Circulant Matrix Structure

A circulant matrix is defined by its first row [c₀, c₁, ..., cₙ₋₁]:

| c₀    c₁    c₂    ...  cₙ₋₁ |
| cₙ₋₁  c₀    c₁    ...  cₙ₋₂ |
| cₙ₋₂  cₙ₋₁  c₀    ...  cₙ₋₃ |
| ...                         |
| c₁    c₂    c₃    ...  c₀   |

FFT Diagonalization

Every circulant matrix C can be diagonalized by the DFT matrix F:

C = F⁻¹ · D · F

Where D is diagonal with eigenvalues equal to the DFT of the generator. This enables O(N log N) multiplication:

y = C·x = F⁻¹ · D · F · x = IFFT(FFT(c) ⊙ FFT(x))

Quantum Walk Evolution

The discrete-time coined quantum walk evolves as:

  1. Coin flip: Apply coin operator C to internal degree of freedom
  2. Shift: Move walker based on coin state

The shift operator is a circulant matrix, enabling FFT-accelerated simulation.

Architecture

See ARCHITECTURE.md for detailed technical documentation including:

  • Module structure and dependencies
  • Type system and trait hierarchy
  • FFT backend abstraction
  • Quantum walk implementation details
  • Mathematical derivations

Modules

Module Description
core Circulant<T> and BlockCirculant<T> types
fft FFT backend trait and RustFFT implementation
traits Scalar, CirculantOps, BlockOps traits
physics Quantum state, coins, and walk simulation
error Error types and Result alias
prelude Convenient re-exports

Running Benchmarks

You can reproduce all benchmark results on your own machine.

Prerequisites

# Rust 1.70+ required
rustc --version

# For Python comparison (optional)
pip install numpy scipy

Rust Benchmarks (Criterion)

# Run all benchmarks (includes FFT vs naive, quantum walk, 2D BCCB)
cargo bench --bench scaling_benchmark --features "physics parallel serde"

# Run core FFT multiplication benchmark only
cargo bench --bench fft_multiply

# View results in browser (after running benchmarks)
open target/criterion/report/index.html   # macOS
xdg-open target/criterion/report/index.html  # Linux

Benchmark output location: target/criterion/

Each benchmark group generates:

  • report/index.html - Interactive HTML report with graphs
  • <group>/report/index.html - Detailed per-group analysis
  • Raw data in JSON format for custom analysis

Python Comparison Benchmarks

cd benchmarks/python

# Run all comparisons (NumPy FFT vs SciPy dense vs naive)
python compare_circulant.py

# Results saved to results_1d.json

Output includes:

  • 1D multiplication timing across sizes (64 to 262,144)
  • Accuracy verification (FFT vs naive)
  • Memory usage comparison
  • Quantum walk simulation timing

Quick Validation

# Run test suite to verify correctness
cargo test --all-features

# Run example to see library in action
cargo run --example quantum_walk_1d --features physics

Interpreting Results

  • Speedup = Naive time / FFT time (higher is better)
  • Throughput = Elements processed per second
  • Criterion reports include statistical analysis (mean, std dev, outliers)
  • Compare your results with the tables above; relative speedups should be consistent across hardware

For detailed methodology, see docs/BENCHMARKS.md.

Roadmap

v0.1.0 (Current)

  • 1D Circulant with FFT multiplication
  • 2D Block Circulant (BCCB)
  • Quantum walk simulation (1D coined walks)
  • Serde serialization
  • Rayon parallelization

v0.2.0 (Planned)

  • Vision module (image filtering with BCCB)
  • 2D quantum walks on torus
  • Hamiltonian module for physics
  • Visualization with plotters

v0.3.0 (Future)

  • GPU acceleration (wgpu/cuda)
  • N-dimensional circulant tensors
  • Circulant neural network layers

License

Licensed under either of:

at your option.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.