phastft 0.4.0

A high-performance Fast Fourier Transform (FFT) library written in pure and safe Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
# Changelog

All notable changes to this project will be documented in this file.

## [0.4.0] - 2026-06-04

The largest release since the initial publish: a SIMD backend with runtime CPU
dispatch, a real-valued FFT, optional multi-threading, and a unified public API.
Treat this as a fresh start — nearly every entry point was renamed.

### Breaking Changes

- The public API was renamed and unified. Every entry point now spells the
  precision as the Rust float type and names the FFT algorithm explicitly:
  - Complex FFT (split real/imag arrays): `fft_64` / `fft_32` are now
    `fft_f64_dit` / `fft_f32_dit`. The full-control `fft_64_with_opts_and_plan`
    / `fft_32_with_opts_and_plan` are now `fft_f64_dit_with_planner_and_opts` /
    `fft_f32_dit_with_planner_and_opts`, and a middle
    `fft_f{64,32}_dit_with_planner` tier was added.
  - Interleaved `Complex` FFT (feature `complex-nums`): `fft_64_interleaved` /
    `fft_32_interleaved` are now `fft_f64_dit_interleaved` /
    `fft_f32_dit_interleaved` (plus `_with_planner` / `_with_planner_and_opts`
    tiers).
  - Planner types `Planner64` / `Planner32` are now `PlannerDit64` /
    `PlannerDit32`, and are direction-agnostic and reusable — one instance
    drives both forward and inverse transforms.
- `Direction::Reverse` was renamed to `Direction::Inverse`.
- The minimum supported Rust version is now declared (`rust-version = "1.88"`).

### Added

- Real-valued FFT — `r2c_fft_f32` / `r2c_fft_f64` and the inverse `c2r_fft_f32`
  / `c2r_fft_f64`, producing/consuming the compact `N/2 + 1` spectrum; roughly
  2x faster than a zero-imaginary complex FFT
  ([#105]https://github.com/smu160/PhastFT/pull/105).
- Optional multi-threading via the `parallel` feature (Rayon): threaded bit
  reversal and a cache-oblivious parallel recursive FFT.
- `Options::smallest_parallel_chunk_size` to tune the parallel split point.
- Fused multi-stage codelets — FFT-16 for `f64`, FFT-32 for `f32`
  ([#101]https://github.com/smu160/PhastFT/pull/101).
- `Debug` / `PartialEq` / `Eq` / `Hash` implementations across the public types.

### Changed

- SIMD backend migrated to `fearless_simd` with runtime CPU-feature dispatch.
  `-C target-cpu=native` is no longer needed, and the crate now builds on
  **stable** Rust — the nightly requirement is gone.
- The FFT core is now a recursive, cache-blocked decimation-in-time algorithm;
  CO-BRAVO provides cache-optimal SIMD bit reversal
  ([#106]https://github.com/smu160/PhastFT/pull/106).

### Removed

- The public `cobra` bit-reversal module, superseded by the internal CO-BRAVO
  implementation.
- The nightly-toolchain requirement.

### Performance

- Cache-blocked recursive DIT, fused first-stage codelets, FMA butterfly
  kernels, and SIMD-accelerated CO-BRAVO bit reversal.

### Fixed

- Inverse FFT output ordering and assorted correctness fixes.

## [0.3.0] - 2025-09-04

### Features

- Add DIT FFT algorithm & bit reversal control

### Bug Fixes

- Bump `bytemuck` to latest version

### Documentation

- Add example usage of interleaved fft

## [0.2.2] - 2025-09-03

### Features

- Add benchmark using criterion
- Add a more robust round-trip FFT test
- Add git hooks and contributing guide
- Add deinterleaving function
- Add criterion group for forward FFT f32
- Add new python benchmarking "framework"
- Add #![feature(doc_cfg)] to make rustdoc actually accept doc_cfg

### Bug Fixes

- Fix formatting
- Fix formatting
- Fix formatting
- Fix formatting
- Fix docsrs config according to Clippy

tbh not sure if Clippy is right in this case
- Avx512 is now stable, so placate clippy
- Update `criterion` benchmarks

### Other

- Vectorize deinterleaving of AoS --> SoA

Use bytemuck + SIMD::deinterleave to rearrange input data from a slice
of Complex values into 2 slices of f32 or f64 values
- Put macro definition begind feature flag
- Account for different signal sizes
- Forgot to add benchmark file
- Don't gate feature documentation on docs.rs, we're on nightly anyway
- Undo duplicate docs.rs all-features

### Refactor

- Make sure benchmark runs
- Make planner reusable

- Planner should be re-usable so it can be re-used for FFT's of the same
  size

- Add regression tests to make sure `fft_64`/`fft_32` gives the same
  results as `fft_64_with_opts_and_plan`/`fft_32_with_opts_and_plan`
- Move planner completely outside of bench function
- Avoid cloning twiddles
- Make examples output time elapsed in nanoseconds
- Use new de-interleaving function
- Make required cfg show up on docs.rs
- Make docs.rs build docs with all features enabled

### Documentation

- Update the normalization section

### Miscellaneous Tasks

- Remove index tracker from hot path
- Updates to benchmarks without planners
- Remove `array_chunks`
- Update pre-commit hook to run clippy everywhere
- Update dependencies and their usage

### Revert

- Revert "Make sure benchmark runs"

This reverts commit 7011dfc040b866cd773185667915a9920b9c5a80.
- Revert "Forgot to add benchmark file"

This reverts commit b70dd4b318f59af32740c4bd4c9e75a69bcd690b.

## [0.2.1] - 2024-05-03

### Features

- Add a function to separate AoS to SoA
- Add reverse separate fn and add test
- Add fft_*_interleaved impls and tests

### Bug Fixes

- Fix formatting
- Fix formatting
- Fix formatting
- Fixes inverse FFT ouput order issue

### Other

- Advertise runtime feature selection
- Transition `num-complex` dependency to optional

- Make num-complex optional and non-default, and all functions that take
  and test interleaved FFT are now under the same feature

- Bump num-complex version 0.4.5 --> 0.4.6

- Enable num-complex for docs/docs.rs

- Update github action workflow to run tests for all features

- Fix formatting in docs to fix links
- Placate clippy's complaints about assign op

### Refactor

- Simplify separate_re_im
- Move twiddles fwd * rev = identity test

- Testing that all the values in the forward twiddle factors pointwise
multiplied by all the values in the inverse twiddle factors is more of a
unit test for twiddles. It made no sense to keep it in the planner.

- The previous commit modifies the planner in a way that this test would
  no longer pass. We can get the same, good coverage by keeping this
  test under twiddles.

### Documentation

- Update docs for interleaved fft

## [0.2.0] - 2024-04-25

### Features

- Add badges for docs and latest published version
- Add more examples and fix typos in readme/plots

- Added rust bindings of FFTW as an example, which will be used for
  benchmarks

- Add fftw (rust bindings) crate as a dev-dependency

- Add an example of using pyphastft to reproduce an example use case of
  FFT from the FFT wikipedia page

- Fix typos in the README and distinguish pyphastft from phastft in the
  python benchmarks plots
- Add todo for improving twiddle generation perf
- Add normalization & output order info. closes #13
- Add tests for 32 bit planner
- Add audio visualization as example for pyphastft
- Add automatic CPU feature detection

### Bug Fixes

- Fix formatting with `cargo fmt`
- Fix formatting
- Fix lint issue brought up by CI

- Fixed `clippy::needless_doctest_main` issue in README

### Other

- Swapped out use of sincos() for more portability across platforms
- Implement SIMDized twiddle generation using macro
- Finish adding test macros for f32/f64

- Make float comparison generic for f32/f64

- Add f32/f64 tests for twiddles using macros

### Refactor

- Make cobra mod public for fast bit reversal
- Use 16 lanes for f32 in SIMD butterfly kernel
- Run `cargo fmt`

### Documentation

- Update README and docs

### Miscellaneous Tasks

- Updated benchmark instructions and fixed typos
- Update python benchmark plot title
- Update benchmark plots
- Update benchmark, plotting, and profiling scripts
- Remove duplicate public functions
- Bump black from 24.1.1 to 24.3.0 in /benches

Bumps [black](https://github.com/psf/black) from 24.1.1 to 24.3.0.
- [Release notes]https://github.com/psf/black/releases
- [Changelog]https://github.com/psf/black/blob/main/CHANGES.md
- [Commits]https://github.com/psf/black/compare/24.1.1...24.3.0

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
- Update example and demo video
- Cleanup docs and examples
- Update codecov uploader to v4
- Update benchmarking readme
- Update benchmark plots
- Update pyphastft benchmark plots

## [0.1.1] - 2024-02-13

### Other

- Release v0.1.1 and update name for python wrapper

### Documentation

- Update README.md

Just skip to features, right away

## [0.1.0] - 2024-02-12

### Features

- Add cargo config
- Add rayon back
- Add chunk size 4 kernel
- Add opt flags
- Add check for dups
- Add COBRA initial impl pre-build twiddle factors
- Add basic tests for bit reversal
- Add bash script for profiling and update readme
- Add a sequential bit reversal permutation
- Add iterator for roots of unity
- Add cobra implementation and integrate into FFT
- Add separate mod for kernels
- Add separate kernel for size 8 butterflies
- Add utils
- Add benchmarks for pyfftw, numpy, and PHFT
- Add test for the SIMD impl
- Add github workflow for CI
- Add status badge
- Add tests for bit rev; rand signal; touch-up docs
- Add codecov job and codecov config for 90% target
- Add code coverage badge
- Add benchmarking scripts
- Add dir with instructions & code to run benchmarks
- Add FFTW3 benchmark
- Add Options struct
- Add fftw3 to benchmark script
- Add a public function that accepts caller-provided `Options`
- Add benchmark plots
- Add plots to readme
- Add `Planner` and ammend examples, tests, etc.
- Add planner mod
- Add bench system config info
- Add license files
- Add asert to check input length is 2^n

- Add a regression test to make sure non-power-of-two FFTs are not
  allowed for the time being
- Add instructions for reproducing benchmarks
- Add tests and make API breaking changes

- Add a test to make sure inverse twiddle factors multiplied by forward
  twiddle factors always gives 1.0

- `fft` now takes the real/imaginary input and the `Direction` of the `fft`

- Pre-built planners can be used with the `fft_with_opts_and_plan`
  function
- Add regression test for planner/fft mismatch
- Add regression docs to explain regression test
- Add requirements.txt for benchmarks

### Bug Fixes

- Fixes for profiling
- Fix for prebuilt twiddles and portable SIMD
- Fix assertion
- Fix overhead from bit reversal threading
- Fix typo in benchmark logging
- Fix build
- Fix typos in benchmarking code
- Fix bug in phast benchmark & add fft direction
- Fix FFTW benchmark to use wall time
- Fix perf regression for small input sizes

- Increased block size increases performance for larger input sizes, but
  the increase caused a regression on a AMD 7950x machine, for smaller
  input sizes. The `Planner` needs to be updated to computes an ideal
  block width. In the interim, we will use 2 * CACHE_LINE_SIZE. On
  x86-64, this is usually 2 * 64 = 128.
- Fix assertion in `fft_with_opts_and_plan`

- Twiddles should always be half the size of input
- Fix typo

### Other

- Don't precompute twiddle factors for large chunks
- Split into two binaries for profiling
- Parallelize bit reverse permutation with 2 threads
- Experiments
- Pre-compute twiddle factors for chunk n
- Bug fix -- clear twiddle factor cache after use
- Abstract out twiddle factor generation
- Prebuild only half the required twiddles
- Try out generating twiddles lazily
- Restructure into lib, add examples, scripts, etc.
- Proof of concept of caching 1/4 of twiddle factors
- Unroll for half of twiddle factors
- Reuse generated twiddle factors by filtering
- Go back to only generating half of twiddles
- Faster twiddles hopefully workable concept this time
- Plug SIMD twiddle generation in
- Generate random signal for py benchmarks
- Increase `BLOCK_SIZE` to 256
- Rewrite README
- Tweaks to "Features" in README
- Wire up guessing the options to the main FFT function
- Placate Clippy
- Exponentially decrease # of iters in benchmark
- Change plot ordering so colors are consistent
- Print human readable bytes without decimal
- Adjust pybindings to work with new API
- Expand on the Python installation instructions
- Mention the license in the README
- Set repository and unset documentation field; it will default to docs.rs
- Mention lower memory use in the "vs" section
- Automate findings of latest benchmark results
- Format benchmark scripts
- Expose `cobra_apply` for fast bit reverseal

### Refactor

- Use std `reverse_bits` for performance improvement
- Use faster impl of bit reversal implementation
- Replace vec with array since it's < 512_000 bytes
- Switch back to pre-built cache using vec
- Use in-place bit reversal in COBRA for small N
- Move kernels to separate mod
- Simplify size 2 butterfly
- Use numpy crate to avoid overhead in pybindings
- Use scipy to generate random signal
- Move rustfft benchmark to examples
- Use random signal for benchmarking rustfft
- Use random signal generator to benchmark phastft
- Move utility functions to separate lib
- Simplify options down to a single multi-threading knob

### Documentation

- Update README
- Update README with basic skeleton
- Update READMEs
- Update README
- Update README
- Update README text to match latest benchmarks
- Update documentation and readme
- Update README.md
- Update README.md

### Performance

- Optimize twiddles pre-compute
- Improve recursive bit-reversal perf for testing

- Tests run much faster now with a bit of loop unrolling, loop fusion,
  and pre-allocating vectors with the right sizes

- Add documentation and cite references
- Improve assertion functions

### Miscellaneous Tasks

- Remove rayon for profiling
- Cleanup for profiling
- Remove rayon for profiling
- Remove unused mods
- Update for benchmarks
- Update profiling section in readme
- Update profile script, add profiling to examples
- Remove useless files
- Remove benchmark mod from lib
- Cleanup all warnings and unused code
- Remove rayon dep, add bar plot, rename to PhastFT
- Update workflow
- Remove old BRAVO implementation written in python
- Updates to examples and plotting script
- Update benchmark plotting
- Update benchmark scripts
- Remove workspace
- Cleanup x-axis of plots
- Update plots for latest benchmarks
- Update the license in Cargo.toml
- Drop unused author fields
- Update MIT license for future contributors
- Cleanup plots and split bar plot for py benches
- Update benchmark plots
- Update saved figure names in benches
- Update benchmark readme

### Revert

- Revert to creating twiddle factors for large n
- Revert "Prebuild only half the required twiddles"

This reverts commit e552fb1ea2b5202c47e4cf1e8392adcefe1d476b.
- Revert "Proof of concept of caching 1/4 of twiddle factors"

This reverts commit de2cca02d99c93b76e3b42bf09d29deeb2eda62f.
- Revert "Add separate kernel for size 8 butterflies"

This reverts commit 249be77548a8b2f23aca4739a5d3aad536939cba.

<!-- generated by git-cliff -->