omni-ffi
Zero-cost C++/CUDA FFI bridge for the OmniPulse Wavelet Scattering Transform engine
omni-ffi provides a type-safe, zero-overhead Rust FFI bridge into the OmniPulse WST math engine — a production-grade C++/CUDA library for computing the Wavelet Scattering Transform (WST) and Joint Time-Frequency Scattering (JTFS) on high-frequency time-series data.
Built on top of cxx, this crate compiles the real C++ math engine (Radix-2 Cooley-Tukey FFT + analytic Morlet filter bank + depth-m scattering cascade) directly into your Rust binary — no mocks, no stubs, no runtime binding resolution.
Why This Exists
The WST/JTFS pipeline is a mathematically rigorous alternative to deep-learning feature extractors. It produces deformation-stable, Lipschitz-continuous fingerprints that are formally bounded against adversarial perturbations — properties that neural networks cannot guarantee.
omni-ffi lets Rust applications call this pipeline with:
- Zero-copy memory transfer — raw
f32buffers pass through the FFI without serialization or heap duplication. - Compile-time dispatch — the
cudafeature flag selects the GPU code path at build time; no runtime branching. - Deterministic ownership —
WSTResultmakes memory ownership explicit: the C++ side allocates, the Rust side frees viafree_wst_result/free_fingerprint.
Table of Contents
- Quick Start
- Build Modes
- API Overview
- Architecture
- Mathematical Background
- Use Cases
- Safety Contract
- Environment Variables
- License
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
Then in your Rust code:
use ;
Build Modes
CPU Build (Default)
The default build compiles wst_bridge_cpu.cpp, which links the full C++ math engine:
- Radix-2 Cooley-Tukey FFT for spectral decomposition
- Analytic Morlet filter bank construction via
build_cpu_morlet_bank() - Depth-m scattering cascade with modulus nonlinearity
No CUDA toolkit required. Runs anywhere Rust + a C++17 compiler are available.
Note: The CPU build silently ignores the
use_jtfsflag — it always computes a plain WST pass. For JTFS phase recovery, use the CUDA build.
CUDA GPU Build
When the cuda feature is enabled, the build system:
- Compiles
wst_bridge_cuda.cpp(with-DOMNI_FFI_HAS_CUDA) - Links
cudartandcufft - Dispatches to the templated
WSTEngine<HopperTag, J, Q>kernel
Requirements:
| Dependency | Version |
|---|---|
| NVIDIA CUDA Toolkit | 11.x or 12.x |
| C++17 Compiler | GCC / Clang |
| GPU Architecture | Ampere+ (sm_80) |
Set CUDA_LIB_DIR if your CUDA libraries are in a non-standard location:
CUDA_LIB_DIR=/usr/local/cuda/lib64
API Overview
run_wst_pipeline
The low-level FFI function. Full control over all parameters.
unsafe
execute_fingerprint_pass
Convenience wrapper: single-batch, depth-2, plain-WST.
pub unsafe
free_fingerprint / free_wst_result
Release the C++ tensor allocation. Must be called exactly once per successful pipeline invocation.
pub unsafe ;
WSTResult
- CPU build:
fingerprint_ptris afloat*fromnew float[]— released viadelete[]. - CUDA build:
fingerprint_ptris aCUdeviceptrfromcudaMalloc— released viacudaFree.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Rust Application │
│ │
│ execute_fingerprint_pass() ──► ffi::run_wst_pipeline() │
│ free_fingerprint() ──► ffi::free_wst_result() │
└────────────────────────┬────────────────────────────────────────┘
│ cxx bridge (zero-cost)
▼
┌─────────────────────────────────────────────────────────────────┐
│ wst_bridge.h (C++ FFI) │
├─────────────────────────┬───────────────────────────────────────┤
│ CPU (default) │ CUDA (--features cuda) │
│ wst_bridge_cpu.cpp │ wst_bridge_cuda.cpp │
│ ┌───────────────────┐ │ ┌──────────────────────────────────┐ │
│ │ cpu_wst_engine.h │ │ │ WSTEngine<HopperTag, J, Q> │ │
│ │ • Radix-2 FFT │ │ │ • cuFFT spectral decomposition │ │
│ │ • Morlet bank │ │ │ • Template-specialized tiles │ │
│ │ • Scattering │ │ │ • Double-buffered pinned memory │ │
│ │ cascade │ │ │ • Dual-stream JTFS dispatch │ │
│ └───────────────────┘ │ └──────────────────────────────────┘ │
└─────────────────────────┴───────────────────────────────────────┘
Memory Ownership Model
- Allocation: C++ allocates the output tensor (
new float[]on CPU,cudaMallocon GPU). - Transfer: The pointer is returned to Rust as
u64insideWSTResult. - Deallocation: Rust calls
free_wst_result/free_fingerprint, which invokesdelete[](CPU) orcudaFree(GPU). - Sentinel: A
fingerprint_ptrof0is a no-op free — safe to call unconditionally.
Mathematical Background
The Wavelet Scattering Transform constructs deformation-stable signal representations through a cascade of wavelet convolutions and modulus nonlinearities.
Scattering Cascade
Given input signal x(u), a Morlet wavelet filter bank ψ_λ, and low-pass scaling function φ_J:
- Zero-order:
S[0]x = x ∗ φ_J - First-order:
S[1]x(u, λ₁) = |x ∗ ψ_λ₁| ∗ φ_J - Depth-m:
S[m]x = || ... |x ∗ ψ_λ₁| ∗ ... | ∗ ψ_λₘ| ∗ φ_J
Key Properties
| Property | Guarantee |
|---|---|
| Parseval energy conservation | Σ ‖S[p]x‖² = ‖x‖² — no information loss across depth |
| Lipschitz continuity | ‖S[p]x − S[p]y‖ ≤ (‖ψ‖₁)^m · ‖x − y‖ — bounded adversarial sensitivity |
| Translation invariance | Controlled by scale 2^J of the low-pass filter |
Joint Time-Frequency Scattering (JTFS)
JTFS recovers phase-coupling information discarded by the modulus operator, applying separable 2D convolution across both time and log-frequency:
Ψ_μ,l,s(t, λ) = ψ_μ(t) · ψ_l,s(λ)
On the CUDA build, this is computed via dual parallel streams for maximum GPU utilization.
Use Cases
| Domain | Application |
|---|---|
| Audio forensics | Perceptual fingerprinting robust to adversarial phase-shifting attacks |
| Astrophysics | Gravitational wave chirp detection in broadband noise |
| Neuroscience | Deformation-stable EEG/ECG feature extraction |
| Genomics | Translation-invariant ChIP-seq peak calling |
| MLOps | Deterministic feature extraction for reproducible ML pipelines |
Safety Contract
All FFI functions are unsafe. The caller must guarantee:
- Valid pointer:
input_plasma_ptrpoints to a contiguous, host-readablef32array of exactlysignal_len × batch_sizeelements. - Liveness: The backing memory must remain live for the entire duration of the call.
- Positive parameters:
signal_len,batch_size,J,Q, anddepthmust all be> 0. - Single free: Each
WSTResultmust be freed exactly once viafree_wst_result/free_fingerprint. Double-free is undefined behavior. - CUDA registration (GPU build): The input pointer must be registered via
cudaHostRegisterfor UVA access.
Violations of invariants 1–3 will cause cxx::Exception (C++ throws std::runtime_error). Violations of 4–5 are undefined behavior.
Environment Variables
| Variable | Purpose | Default |
|---|---|---|
OMNI_WST_CORE_CPP |
Path to Module-I-omni-wst-core C++ sources | ../Module-I-omni-wst-core/cpp |
CUDA_LIB_DIR |
Path to CUDA shared libraries | System default |
CARGO_FEATURE_CUDA |
Set automatically by --features cuda |
— |
Building from Source
# Ensure the WST engine sources are available
# CPU build
# GPU build (Linux with CUDA toolkit)
# Run tests
# Generate documentation
License
Licensed under the Apache License, Version 2.0.
Copyright 2025 Samvardhan Singh
Part of the OmniPulse signal intelligence platform.