vyre-conform 0.1.0

Conformance suite for vyre backends — proves byte-identical output to CPU reference
Documentation
# Performance Verification

Performance verification records whether a backend meets minimum throughput requirements under the conformance configuration.

## Why Performance Is Part of Conformance

A backend that is mathematically correct but too slow to process real workloads is not production-ready. Performance baselines ensure that correctness and usability are verified together.

## How It Works

Each `OpSpec` may declare `min_throughput_bytes_per_sec`. When set, the conformance suite:
1. Dispatches the operation with a large-enough input to amortize overhead.
2. Measures wall-clock time from dispatch call to result return.
3. Computes throughput: `input_bytes / elapsed_seconds`.
4. Reports a failure if throughput is below the threshold.

## Configuration

Performance verification uses the **conformance configuration**, not an optimized production configuration. A backend that passes performance at workgroup_size=1 and 64 with the generic harness wrapper is likely to be fast enough in real use.

## Relationship to Levels

Performance baselines are required for L4 (Full) conformance. They are not required for L1, L2, or L3.

## Fix Direction

A performance failure means the backend or runtime has unnecessary overhead. Fixes include:
- Reducing buffer copies
- Batching dispatches
- Using larger workgroup sizes where appropriate
- Avoiding synchronous round-trips inside the dispatch path