par-validator

Parallel validation for payment / trading-style payloads: run string and identifier checks on the CPU with Rayon, and pack fixed-point numerics (ranges, GST, FX conversion, simple interest) into one WebGPU compute dispatch per batch.

The crate is built around rust-key-paths for type-safe field access and wgpu for portable GPU execution (Vulkan, Metal, DX12, etc.).

Beginner’s guide

This section is for you if you are new to the project or to Rust crates that mix CPU parallelism and GPU compute.

What you should already know

Rust basics: cargo, struct, enum, impl, and reading example binaries under examples/.
Optional: A little exposure to shaders or WebGPU helps for the numeric path, but you can treat GpuNumericEngine as a black box at first.

What this crate does (in one minute)

String-style validation — You point at a field on a struct (via key paths from #[derive(Kp)]), attach rule functions, and call apply. Mandatory rules run one after another; if one fails, the rest of that builder’s mandatory chain is skipped. Other rules for the same field run in parallel with Rayon.
Numeric validation and math — You fill a Vec<NumericRule> with fixed-point integers (amounts scaled ×100). The GPU runs many rules in one dispatch and returns a Vec<NumericOutput> (error code + optional calculated value).

So: texty things → CPU + Rayon, lots of fixed-point number rules → GPU batch.

Prerequisites

Rust (stable, recent; edition 2024 is declared in Cargo.toml — use a toolchain that supports it).
A working GPU stack for examples that call GpuNumericEngine (e.g. Metal on Apple Silicon, Vulkan on many Linux setups, DX12 on Windows). If the GPU path fails, check drivers / OS support for wgpu.

First run (step by step)

Clone the repo and enter the crate directory (the one that contains Cargo.toml):
```
cd par-validator
```
Optional — smallest sample (CPU only, no GPU):
```
cargo run --example basics
```
Run the small end-to-end demo (strings on the CPU, numbers on the GPU):
```
cargo run --example hybrid_gpu
```

When that works, try a CPU-only large batch:

cargo run --release --example fintech_rayon_nested

Then a GPU-heavy batch:

cargo run --release --example fintech_gpu_batch

Where to read the code (suggested order)

Order	File	Why
1	`examples/basics.rs`	Minimal `Rule` / `builder` only (no GPU)
2	`examples/hybrid_gpu.rs`	Short story: `Rule` + `GpuNumericEngine::run`
3	`src/builder.rs`	`Rule`: key path + bool predicates + `apply`
4	`src/gpu_numeric.rs`	Rules, outputs, WGSL shader
5	`examples/fintech_hybrid_batch.rs`	Bigger payload, both layers

Concepts cheat sheet

Term	Meaning
Key path (`Kp`)	Compile-time accessor to a field, e.g. `MyStruct::field_name()`, from `key-paths-derive`.
`Rule`	Holds a key path + mandatory / parallel `fn(Option<&V>) -> bool` rules, each with an error `E`; `apply()` returns `Vec<E>` for failures.
Mandatory rule	Runs first, in order; first failure stops the rest of the mandatory list for that builder.
`NumericRule`	One row: value ×100, rule kind, two integer parameters (meaning depends on kind).
`GpuNumericEngine::run`	Uploads all rules, runs compute shader once, downloads results (blocking).

Flattening nested structs

Your business model can be deeply nested (parties, legs, charges). This crate’s Rule::new expects a KpType from plain #[derive(Kp)] fields. A common pattern (see fintech_rayon_nested) is a flat “view” struct that mirrors nested data for validation only.

If something goes wrong

cd to par-validator — Commands assume the crate root next to src/ and examples/.
GPU panic / “no adapter” — Environment may lack a supported backend; try another machine or update OS/GPU drivers.
Slower than expected — Use cargo run --release for large examples and benchmarks.

Features

Layer	What	How
Strings / IDs	BIC-shaped data, IBAN-like lengths, UETR, product codes	`Rule` + `#[derive(Kp)]` + Rayon
Numerics	Range, %, tax in bps, FX ×1000 rate, interest	`GpuNumericEngine` + WGSL compute

Mandatory rules run in order and short-circuit on the first failure.
Other rules for the same key path run in parallel inside Rule::apply.
GPU: all values are i32 scaled ×100 end-to-end in the shader (no f32/f64 on the boundary), avoiding nondeterministic floats in validation math.

Quick start

cd par-validator
cargo run --example hybrid_gpu

Basic example (CPU only)

The smallest program uses Rule on a #[derive(Kp)] struct — no GPU, no async, so it is easy to copy into your own crate.

cargo run --example basics

Source: examples/basics.rs. Core idea:

use key_paths_derive::Kp;
use par_validator::Rule;

#[derive(Kp)]
struct Payment {
    reference: String,
}

fn not_empty_ok(r: Option<&String>) -> bool {
    r.map(|s| !s.trim().is_empty()).unwrap_or(false)
}

fn max_len_16_ok(r: Option<&String>) -> bool {
    r.map(|s| s.len() <= 16).unwrap_or(false)
}

let p = Payment { reference: "REF-001".into() };

let results = Rule::<Payment, String, String>::new(Payment::reference())
    .with_root(&p)
    .mandatory_rule(not_empty_ok, "missing or blank".into())
    .rule(max_len_16_ok, "too long".into())
    .apply();

mandatory_rule runs first; if it fails, you get a single-element Vec and parallel rules are skipped.
rule adds checks that run in parallel with each other inside apply().

Release builds are recommended for large batches:

cargo run --release --example fintech_rayon_nested
cargo run --release --example fintech_gpu_batch
cargo run --release --example fintech_hybrid_batch

Examples

Example	Focus
`basics`	Starter: one field, mandatory + parallel rules, no GPU
`hybrid_gpu`	Small walkthrough: Rayon + GPU together
`fintech_rayon_nested`	~4k nested ISO-20022–style transfers → flatten to key paths → CPU-only validation
`fintech_gpu_batch`	~12k numeric rules (2k legs × 6 checks) in one GPU `run`
`fintech_hybrid_batch`	~1k trade legs: Rayon on strings, single GPU batch for notionals / tax / FX

The nested Rayon example keeps a deep domain struct (BankParty, charges, FX block) and projects it to a flat #[derive(Kp)] view for validation. That matches how rust-key-paths KpType integrates with this crate today; composed .then() chains use a different internal representation and are better handled via flattening or dynamic key paths upstream.

Benchmarks

Criterion benchmarks live in par-validator/benches/throughput.rs.

cd par-validator
cargo bench --bench throughput

Run only the large GPU case (handy on a Windows/Linux NVIDIA box with Vulkan or DX12):

cargo bench --bench throughput -- nvidia_gpu

Results (reference machine)

Benchmark	What it measures	Typical time (M1 Air)
`rayon_cpu_4096_transfers_x12_rules`	4 096 flat transfers × 12 `Rule` runs (Rayon over rows + inner `par_iter` in `apply`)	~2.86 ms
`wgpu_gpu/12288_numeric_rules_one_dispatch`	12 288 `NumericRule` rows in one `GpuNumericEngine::run`	~1.53 ms
`wgpu_gpu_nvidia/nvidia_gpu_98304_numeric_rules_one_dispatch`	98 304 rules (16 384 legs × 6), one dispatch — stress size for discrete GPUs	~3.44 ms

Testing machine: MacBook Air M1, Apple Silicon (arm64), macOS, Rust 1.85, cargo bench release profile. Figures are Criterion medians from a single run; variance and outliers are normal—re-run locally for your hardware.

NVIDIA: On a PC with an NVIDIA GPU, install current drivers, use the same cargo bench --bench throughput -- nvidia_gpu command, and compare the reported time to the M1 row above (wgpu will typically use Vulkan or DX12).

Library layout

builder::Rule — CPU validation keyed by KpType (pub use as par_validator::Rule).
gpu_numeric — NumericRule / NumericOutput, GpuNumericEngine::run, WGSL shader (portable wide integer math for Metal).

Full API docs:

cargo doc --open -p par-validator

Dependencies

rayon, rust-key-paths, wgpu, bytemuck, pollster (examples / blocking on async GPU init).

Dev-only: key-paths-derive for #[derive(Kp)] in examples.

License

See the repository LICENSE file.

par-validator 0.2.0