vyre-conform 0.1.0

Conformance suite for vyre backends — proves byte-identical output to CPU reference
Documentation
# Determinism Verification

Determinism verification runs the same valid input through the same backend multiple times and requires byte-identical output on every run.

## Why Determinism Is Not Free

GPU compute is parallel by default. Bugs that appear only under specific thread interleavings are common:
- Uninitialized workgroup-shared memory
- Missing barriers
- Data races on atomic operations
- Non-associative reductions performed in arbitrary order

A single-run parity pass will miss all of these.

## How The Check Works

For each operation (or engine), the suite:
1. Selects a representative input subset (edge cases + pathological values).
2. Dispatches it `N` times with the same configuration.
3. Compares all `N` outputs byte-for-byte.
4. Reports any mismatch as a nondeterminism failure.

## When To Run

Determinism checks are expensive, so they are typically run:
- Nightly, on the full primitive and engine suite
- Before any certification claim
- After any backend change that touches memory barriers, atomics, or workgroup scheduling

## Fix Direction

A nondeterminism failure means the backend output depends on something other than the input and WGSL. Common fixes:
- Zero-initialize shared memory
- Add missing `workgroupBarrier()` or `storageBarrier()`
- Make atomic reduction order-independent
- Eliminate uninitialized output bytes