1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
#![cfg_attr(all(feature = "bench", test), feature(test))]

//! RustFFT is a high-performance FFT library written in pure Rust.
//!
//! RustFFT supports the AVX instruction set for increased performance. No special code is needed to activate AVX:
//! Simply plan a FFT using the FftPlanner on a machine that supports the `avx` and `fma` CPU features, and RustFFT
//! will automatically switch to faster AVX-accelerated algorithms.
//!
//! ### Usage
//!
//! The recommended way to use RustFFT is to create a [`FftPlanner`](crate::FftPlanner) instance and then call its
//! [`plan_fft`](crate::FftPlanner::plan_fft) method. This method will automatically choose which FFT algorithms are best
//! for a given size and initialize the required buffers and precomputed data.
//!
//! ```
//! // Perform a forward FFT of size 1234
//! use rustfft::{FftPlanner, num_complex::Complex};
//!
//! let mut planner = FftPlanner::new();
//! let fft = planner.plan_fft_forward(1234);
//!
//! let mut buffer = vec![Complex{ re: 0.0f32, im: 0.0f32 }; 1234];
//! fft.process(&mut buffer);
//! ```
//! The planner returns trait objects of the [`Fft`](crate::Fft) trait, allowing for FFT sizes that aren't known
//! until runtime.
//!
//! RustFFT also exposes individual FFT algorithms. For example, if you know beforehand that you need a power-of-two FFT, you can
//! avoid the overhead of the planner and trait object by directly creating instances of the [`Radix4`](crate::algorithm::Radix4) algorithm:
//!
//! ```
//! // Computes a forward FFT of size 4096
//! use rustfft::{Fft, FftDirection, num_complex::Complex, algorithm::Radix4};
//!
//! let fft = Radix4::new(4096, FftDirection::Forward);
//!
//! let mut buffer = vec![Complex{ re: 0.0f32, im: 0.0f32 }; 4096];
//! fft.process(&mut buffer);
//! ```
//!
//! For the vast majority of situations, simply using the [`FftPlanner`](crate::FftPlanner) will be enough, but
//! advanced users may have better insight than the planner into which algorithms are best for a specific size. See the
//! [`algorithm`](crate::algorithm) module for a complete list of scalar algorithms implemented by RustFFT.
//!
//! Users should beware, however, that bypassing the planner will disable all AVX optimizations.
//!
//! ### Feature Flags
//!
//! * `avx` (Enabled by default)
//!
//!     On x86_64, the `avx` feature enables compilation of AVX-accelerated code. Enabling it greatly improves performance if the
//!     client CPU supports AVX, while disabling it reduces compile time and binary size.
//!     On every other platform, this feature does nothing, and RustFFT will behave like it's not set.
//!
//! ### Normalization
//!
//! RustFFT does not normalize outputs. Callers must manually normalize the results by scaling each element by
//! `1/len().sqrt()`. Multiple normalization steps can be merged into one via pairwise multiplication, so when
//! doing a forward FFT followed by an inverse callers can normalize once by scaling each element by `1/len()`
//!
//! ### Output Order
//!
//! Elements in the output are ordered by ascending frequency, with the first element corresponding to frequency 0.
//!
//! ### AVX Performance Tips
//!
//! In any FFT computation, the time required to compute a FFT of size N relies heavily on the [prime factorization](https://en.wikipedia.org/wiki/Integer_factorization) of N.
//! If N's prime factors are all very small, computing a FFT of size N will be fast, and it'll be slow if N has large prime
//! factors, or if N is a prime number.
//!
//! In most FFT libraries (Including RustFFT when using non-AVX code), power-of-two FFT sizes are the fastest, and users see a steep
//! falloff in performance when using non-power-of-two sizes. Thankfully, RustFFT using AVX acceleration is not quite as restrictive:
//!
//! - Any FFT whose size is of the form `2^n * 3^m` can be considered the "fastest" in RustFFT.
//! - Any FFT whose prime factors are all 11 or smaller will also be very fast, but the fewer the factors of 2 and 3 the slower it will be.
//!     For example, computing a FFT of size 13552 `(2^4*7*11*11)` is takes 12% longer to compute than 13824 `(2^9 * 3^3)`,
//!     and computing a FFT of size 2541 `(3*7*11*11)` takes 65% longer to compute than 2592 `(2^5 * 3^4)`
//! - Any other FFT size will be noticeably slower. A considerable amount of effort has been put into making these FFT sizes as fast as
//!     they can be, but some FFT sizes just take more work than others. For example, computing a FFT of size 5183 `(71 * 73)` takes about
//!     5x longer than computing a FFT of size 5184 `(2^6 * 3^4)`.
//!
//! In most cases, even prime-sized FFTs will be fast enough for your application. In the example of 5183 above, even that "slow" FFT
//! only takes a few tens of microseconds to compute.
//!
//! Our advice is to start by trying the size that's most convenient to your application.
//! If that's too slow, see if you can find a nearby size whose prime factors are all 11 or smaller, and you can expect a 2x-5x speedup.
//! If that's still too slow, find a nearby size whose prime factors are all 2 or 3, and you can expect a 1.1x-1.5x speedup.

use std::fmt::Display;

pub use num_complex;
pub use num_traits;

#[macro_use]
mod common;

/// Individual FFT algorithms
pub mod algorithm;
mod array_utils;
mod fft_cache;
mod math_utils;
mod plan;
mod twiddles;

use num_complex::Complex;
use num_traits::Zero;

pub use crate::common::FftNum;
pub use crate::plan::{FftPlanner, FftPlannerScalar};

/// A trait that allows FFT algorithms to report their expected input/output size
pub trait Length {
    /// The FFT size that this algorithm can process
    fn len(&self) -> usize;
}

/// Represents a FFT direction, IE a forward FFT or an inverse FFT
#[derive(Copy, Clone, PartialEq, Eq, Debug)]
pub enum FftDirection {
    Forward,
    Inverse,
}
impl FftDirection {
    /// Returns the opposite direction of `self`.
    ///
    ///  - If `self` is `FftDirection::Forward`, returns `FftDirection::Inverse`
    ///  - If `self` is `FftDirection::Inverse`, returns `FftDirection::Forward`
    pub fn opposite_direction(&self) -> FftDirection {
        match self {
            Self::Forward => Self::Inverse,
            Self::Inverse => Self::Forward,
        }
    }
}
impl Display for FftDirection {
    fn fmt(&self, f: &mut ::std::fmt::Formatter) -> Result<(), ::std::fmt::Error> {
        match self {
            Self::Forward => f.write_str("Forward"),
            Self::Inverse => f.write_str("Inverse"),
        }
    }
}

/// A trait that allows FFT algorithms to report whether they compute forward FFTs or inverse FFTs
pub trait Direction {
    /// Returns FftDirection::Forward if this instance computes forward FFTs, or FftDirection::Inverse for inverse FFTs
    fn fft_direction(&self) -> FftDirection;
}

/// Trait for algorithms that compute FFTs.
///
/// This trait has a few methods for computing FFTs. Its most conveinent method is [`process(slice)`](crate::Fft::process).
/// It takes in a slice of `Complex<T>` and computes a FFT on that slice, in-place. It may copy the data over to internal scratch buffers
/// if that speeds up the computation, but the output will always end up in the same slice as the input.
pub trait Fft<T: FftNum>: Length + Direction + Sync + Send {
    /// Computes a FFT in-place.
    ///
    /// Convenience method that allocates a `Vec` with the required scratch space and calls `self.process_with_scratch`.
    /// If you want to re-use that allocation across multiple FFT computations, consider calling `process_with_scratch` instead.
    ///
    /// # Panics
    ///
    /// This method panics if:
    /// - `buffer.len() % self.len() > 0`
    /// - `buffer.len() < self.len()`
    fn process(&self, buffer: &mut [Complex<T>]) {
        let mut scratch = vec![Complex::zero(); self.get_inplace_scratch_len()];
        self.process_with_scratch(buffer, &mut scratch);
    }

    /// Divides `buffer` into chunks of size `self.len()`, and computes a FFT on each chunk.
    ///
    /// Uses the `scratch` buffer as scratch space, so the contents of `scratch` should be considered garbage
    /// after calling.
    ///
    /// # Panics
    ///
    /// This method panics if:
    /// - `buffer.len() % self.len() > 0`
    /// - `buffer.len() < self.len()`
    /// - `scratch.len() < self.get_inplace_scratch_len()`
    fn process_with_scratch(&self, buffer: &mut [Complex<T>], scratch: &mut [Complex<T>]);

    /// Divides `input` and `output` into chunks of size `self.len()`, and computes a FFT on each chunk.
    ///
    /// This method uses both the `input` buffer and `scratch` buffer as scratch space, so the contents of both should be
    /// considered garbage after calling.
    ///
    /// This is a more niche way of computing a FFT. It's useful to avoid a `copy_from_slice()` if you need the output
    /// in a different buffer than the input for some reason. This happens frequently in RustFFT internals, but is probably
    /// less common among RustFFT users.
    ///
    /// For many FFT sizes, `self.get_outofplace_scratch_len()` returns 0
    ///
    /// # Panics
    ///
    /// This method panics if:
    /// - `output.len() != input.len()`
    /// - `input.len() % self.len() > 0`
    /// - `input.len() < self.len()`
    /// - `scratch.len() < self.get_outofplace_scratch_len()`
    fn process_outofplace_with_scratch(
        &self,
        input: &mut [Complex<T>],
        output: &mut [Complex<T>],
        scratch: &mut [Complex<T>],
    );

    /// Returns the size of the scratch buffer required by `process_with_scratch`
    fn get_inplace_scratch_len(&self) -> usize;

    /// Returns the size of the scratch buffer required by `process_outofplace_with_scratch`
    ///
    /// For many FFT sizes, out-of-place FFTs require zero scratch, and this method will return zero - although that may change from one RustFFT version to the next.
    fn get_outofplace_scratch_len(&self) -> usize;
}

// Algorithms implemented to use AVX instructions. Only compiled on x86_64, and only compiled if the "avx" feature flag is set.
#[cfg(all(target_arch = "x86_64", feature = "avx"))]
mod avx;

// If we're not on x86_64, or if the avx feature was disabled, keep a stub implementation around that has the same API, but does nothing
// That way, users can write code using the AVX planner and compile it on any platform
#[cfg(not(all(target_arch = "x86_64", feature = "avx")))]
mod avx {
    pub mod avx_planner {
        use crate::{Fft, FftDirection, FftNum};
        use std::sync::Arc;

        /// The AVX FFT planner creates new FFT algorithm instances which take advantage of the AVX instruction set.
        ///
        /// Creating an instance of `FftPlannerAvx` requires the `avx` and `fma` instructions to be available on the current machine, and it requires RustFFT's
        ///  `avx` feature flag to be set. A few algorithms will use `avx2` if it's available, but it isn't required.
        ///
        /// For the time being, AVX acceleration is black box, and AVX accelerated algorithms are not available without a planner. This may change in the future.
        ///
        /// ~~~
        /// // Perform a forward Fft of size 1234, accelerated by AVX
        /// use std::sync::Arc;
        /// use rustfft::{FftPlannerAvx, num_complex::Complex};
        ///
        /// // If FftPlannerAvx::new() returns Ok(), we'll know AVX algorithms are available
        /// // on this machine, and that RustFFT was compiled with the `avx` feature flag
        /// if let Ok(mut planner) = FftPlannerAvx::new() {
        ///     let fft = planner.plan_fft_forward(1234);
        ///
        ///     let mut buffer = vec![Complex{ re: 0.0f32, im: 0.0f32 }; 1234];
        ///     fft.process_inplace(&mut buffer);
        ///
        ///     // The FFT instance returned by the planner has the type `Arc<dyn Fft<T>>`,
        ///     // where T is the numeric type, ie f32 or f64, so it's cheap to clone
        ///     let fft_clone = Arc::clone(&fft);
        /// }
        /// ~~~
        ///
        /// If you plan on creating multiple FFT instances, it is recommended to reuse the same planner for all of them. This
        /// is because the planner re-uses internal data across FFT instances wherever possible, saving memory and reducing
        /// setup time. (FFT instances created with one planner will never re-use data and buffers with FFT instances created
        /// by a different planner)
        ///
        /// Each FFT instance owns [`Arc`s](std::sync::Arc) to its internal data, rather than borrowing it from the planner, so it's perfectly
        /// safe to drop the planner after creating Fft instances.
        pub struct FftPlannerAvx<T: FftNum> {
            _phantom: std::marker::PhantomData<T>,
        }
        impl<T: FftNum> FftPlannerAvx<T> {
            /// Constructs a new `FftPlannerAvx` instance.
            ///
            /// Returns `Ok(planner_instance)` if this machine has the required instruction sets and the `avx` feature flag is set.
            /// Returns `Err(())` if some instruction sets are missing, or if the `avx` feature flag is not set.
            pub fn new() -> Result<Self, ()> {
                Err(())
            }
            /// Returns a `Fft` instance which uses AVX instructions to compute FFTs of size `len`.
            ///
            /// If the provided `direction` is `FftDirection::Forward`, the returned instance will compute forward FFTs. If it's `FftDirection::Inverse`, it will compute inverse FFTs.
            ///
            /// If this is called multiple times, the planner will attempt to re-use internal data between calls, reducing memory usage and FFT initialization time.
            pub fn plan_fft(&mut self, _len: usize, _direction: FftDirection) -> Arc<dyn Fft<T>> {
                unreachable!()
            }
            /// Returns a `Fft` instance which uses AVX instructions to compute forward FFTs of size `len`.
            ///
            /// If this is called multiple times, the planner will attempt to re-use internal data between calls, reducing memory usage and FFT initialization time.
            pub fn plan_fft_forward(&mut self, _len: usize) -> Arc<dyn Fft<T>> {
                unreachable!()
            }
            /// Returns a `Fft` instance which uses AVX instructions to compute inverse FFTs of size `len.
            ///
            /// If this is called multiple times, the planner will attempt to re-use internal data between calls, reducing memory usage and FFT initialization time.
            pub fn plan_fft_inverse(&mut self, _len: usize) -> Arc<dyn Fft<T>> {
                unreachable!()
            }
        }
    }
}

pub use self::avx::avx_planner::FftPlannerAvx;

#[cfg(test)]
mod test_utils;