Skip to main content

preprocess

Function preprocess 

Source
pub fn preprocess(
    data: Array2<f32>,
    chan_pos: Array2<f32>,
    src_sfreq: f32,
    cfg: &PipelineConfig,
) -> Result<Vec<(Array2<f32>, Array2<f32>)>>
Expand description

Run the full EEG preprocessing pipeline on a single continuous recording.

This is the main entry point for the exg library. It chains all preprocessing steps in the exact order used to train the model and matches the MNE-Python reference implementation to within floating-point rounding error (< 4 × 10⁻⁶ on typical EEG data).

§Pipeline steps

  1. Zero-fill channels listed in PipelineConfig::bad_channels.
  2. Resample from src_sfreq to PipelineConfig::target_sfreq (FFT polyphase).
  3. Apply a zero-phase highpass FIR filter at PipelineConfig::hp_freq.
  4. Subtract the per-timepoint channel mean (average reference).
  5. Apply global z-score normalisation (ddof = 0).
  6. Split into non-overlapping epochs of PipelineConfig::epoch_samples() samples and apply per-epoch per-channel baseline correction.
  7. Divide each epoch by PipelineConfig::data_norm.

§Arguments

  • data – Raw EEG signal, shape [C, T], in original units (volts). Must have at least cfg.epoch_samples() columns; shorter recordings produce zero epochs.
  • chan_pos – Channel positions in metres, shape [C, 3]. Returned unchanged alongside each epoch so downstream code has direct access to spatial layout.
  • src_sfreq – Sampling rate of data in Hz.
  • cfg – Pipeline configuration (see PipelineConfig).

§Returns

A Vec of (epoch_data, chan_pos) tuples:

  • epoch_data — shape [C, cfg.epoch_samples()], f32.
  • chan_pos — the original chan_pos argument (cloned, f32).

The length of the Vec is floor(T_resampled / cfg.epoch_samples()). Trailing samples that do not fill a complete epoch are discarded.

§Errors

Returns an error if:

  • The resampler fails (e.g. zero-length input).
  • The FIR convolution fails (internal FFT planner error, extremely rare).

§Examples

use exg::{preprocess, PipelineConfig};
use ndarray::Array2;

// 12-channel, 15-second recording at 256 Hz
let data: Array2<f32> = Array2::zeros((12, 3840));
let chan_pos: Array2<f32> = Array2::zeros((12, 3));

let cfg    = PipelineConfig::default();
let epochs = preprocess(data, chan_pos, 256.0, &cfg).unwrap();
assert_eq!(epochs.len(), 2); // floor(3840 / 1280) = 2 (baseline uses 1 epoch's worth)