vyre 0.4.0

GPU compute intermediate representation with a standard operation library
Documentation
# Adding Your First Op

By the end of this tutorial, you will have created `primitive.math.gcd` as a fully conformant Category A operation. You will have written a CPU reference oracle, a matching GPU kernel, declared its algebraic laws, added known-answer and adversarial test vectors, and verified the op end-to-end with `cargo test -p vyre-conform --lib`. The entire change requires no manual `mod.rs` edits and no hand-written central registry updates.

## Step 1: Pick an op

Operation IDs are dot-separated paths: `primitive.math.gcd`.

- `primitive` — Layer 1 building block.
- `math` — domain category.
- `gcd` — the operation itself.

Signature inference for `gcd` is straightforward: two unsigned 32-bit inputs, one unsigned 32-bit output. In code this is `(U32, U32) -> U32`.

**Category A vs Category C**

- **Category A** (compositional): the op can be expressed entirely from existing vyre IR primitives. The generic WGSL lowerer handles it. `gcd` is Category A because it composes from `Expr::rem`, `Node::loop_for`, and `Node::if_then`.
- **Category C** (custom lowerer): the op needs hand-written WGSL emission logic in `core/src/lower/wgsl/`. Only choose Category C when the generic lowerer cannot express the semantics.

## Step 2: Copy the template

Create the core op file from the template:

```bash
cp core/src/ops/template_op.rs core/src/ops/primitive/math/gcd.rs
```

vyre uses `automod` for discovery. `core/src/ops/primitive/math/mod.rs` contains:

```rust
automod::dir!(pub "src/ops/primitive/math");
```

Drop `gcd.rs` into that directory and it is automatically discovered. No `mod.rs` edits needed.

## Step 3: Fill in identity

Open `core/src/ops/primitive/math/gcd.rs` and replace the template with the op identity and IR program:

```rust
use crate::ir::{BufferDecl, DataType, Expr, Node, Program};
use crate::ops::primitive::WORKGROUP_SIZE;
use crate::ops::{AlgebraicLaw, OpSpec, U32_U32_INPUTS, U32_OUTPUTS};

const LAWS: &[AlgebraicLaw] = &[
    AlgebraicLaw::Commutative,
    AlgebraicLaw::Associative,
    AlgebraicLaw::Identity { element: 0 },
    AlgebraicLaw::Idempotent,
];

/// Greatest common divisor operation.
#[derive(Debug, Clone, Copy, Default)]
pub struct Gcd;

impl Gcd {
    pub const SPEC: OpSpec = OpSpec::composition_inlinable(
        "primitive.math.gcd",
        U32_U32_INPUTS,
        U32_OUTPUTS,
        LAWS,
        Self::program,
    );

    #[must_use]
    pub fn program() -> Program {
        let idx = Expr::var("idx");
        Program::new(
            vec![
                BufferDecl::read("a", 0, DataType::U32),
                BufferDecl::read("b", 1, DataType::U32),
                BufferDecl::output("out", 2, DataType::U32),
            ],
            WORKGROUP_SIZE,
            vec![
                Node::let_bind("idx", Expr::gid_x()),
                Node::if_then(
                    Expr::lt(idx.clone(), Expr::buf_len("out")),
                    vec![
                        Node::let_bind("x", Expr::load("a", idx.clone())),
                        Node::let_bind("y", Expr::load("b", idx.clone())),
                        Node::loop_for(
                            "i",
                            Expr::u32(0),
                            Expr::u32(64),
                            vec![Node::if_then(
                                Expr::ne(Expr::var("y"), Expr::u32(0)),
                                vec![
                                    Node::let_bind("t", Expr::var("y")),
                                    Node::assign(
                                        "y",
                                        Expr::rem(Expr::var("x"), Expr::var("y")),
                                    ),
                                    Node::assign("x", Expr::var("t")),
                                ],
                            )],
                        ),
                        Node::store("out", idx, Expr::var("x")),
                    ],
                ),
            ],
        )
    }
}
```

The loop is bounded to 64 iterations. That is sufficient for all 32-bit inputs; the worst-case Euclidean steps on `u32` is 48 for consecutive Fibonacci numbers.

## Step 4: Write cpu_fn

The CPU reference is the ground truth. It lives in the conformance crate, not in `core`. Create `conform/src/specs/primitive/math/gcd.rs` and add the oracle:

```rust
fn cpu(input: &[u8]) -> Vec<u8> {
    if input.len() < 8 {
        return vec![0; 4];
    }
    let mut a = u32::from_le_bytes([input[0], input[1], input[2], input[3]]);
    let mut b = u32::from_le_bytes([input[4], input[5], input[6], input[7]]);
    while b != 0 {
        let t = b;
        b = a % b;
        a = t;
    }
    a.to_le_bytes().to_vec()
}
```

Every byte matters. The GPU must produce identical output for every input.

## Step 5: Write wgsl_fn

In the same conform spec file, add the hand-written WGSL that mirrors `cpu_fn` byte-for-byte:

```rust
fn wgsl() -> String {
    r#"fn vyre_op(index: u32, input_len: u32) -> u32 {
    var x = input.data[0u];
    var y = input.data[1u];
    for (var i = 0u; i < 64u; i = i + 1u) {
        if (y != 0u) {
            let t = y;
            y = x % y;
            x = t;
        }
    }
    return x;
}"#
    .to_string()
}
```

This WGSL is what the conformance suite uses for byte-exact comparison against the lowered output of `Gcd::program()`.

## Step 6: Declare algebraic laws

In `core/src/ops/primitive/math/gcd.rs`, keep the `LAWS` array:

```rust
const LAWS: &[AlgebraicLaw] = &[
    AlgebraicLaw::Commutative,
    AlgebraicLaw::Associative,
    AlgebraicLaw::Identity { element: 0 },
    AlgebraicLaw::Idempotent,
];
```

In `conform/src/specs/primitive/math/gcd.rs`, wire them into the spec builder:

```rust
pub fn vyre_op() -> OpSpec {
    let id = "primitive.math.gcd";
    crate::specs::primitive::make_spec(id, crate::specs::primitive::binary_u32_sig(), cpu, wgsl)
        .category(crate::specs::primitive::category_a_self(id))
        .laws(vec![
            crate::specs::primitive::AlgebraicLaw::Commutative,
            crate::specs::primitive::AlgebraicLaw::Associative,
            crate::specs::primitive::AlgebraicLaw::Identity { element: 0 },
            crate::specs::primitive::AlgebraicLaw::Idempotent,
        ])
        .strictness(crate::Strictness::Strict)
        .version(1)
        .equivalence_classes(vec![EquivalenceClass::universal("all u32 pairs")])
        .boundary_values(crate::specs::primitive::binary_common_boundaries())
        .build()
        .expect("registry invariant violated")
}

/// Compatibility alias for older tests and callers.
pub fn spec() -> OpSpec {
    vyre_op()
}
```

## Step 7: Add test vectors

Still in the conform spec file, add golden samples, KAT vectors, and adversarial inputs:

```rust
use crate::specs::primitive::EquivalenceClass;
use crate::OpSpec;
use crate::verify::golden_samples::GoldenSample;

pub const VYRE_OP_METADATA: vyre_spec::OpMetadata = vyre_spec::OpMetadata {
    id: "primitive.math.gcd",
    layer: vyre_spec::Layer::L1,
    category: vyre_spec::MetadataCategory::A,
    version: 1,
    description: "primitive math gcd",
    signature: "(U32, U32) -> U32",
    strictness: "strict",
    archetype_signature: "(U32, U32) -> U32",
};

pub const GOLDEN: &[GoldenSample] = &[
    GoldenSample {
        op_id: "primitive.math.gcd",
        input: &[0x30, 0x00, 0x00, 0x00, 0x12, 0x00, 0x00, 0x00],
        expected: &[0x06, 0x00, 0x00, 0x00],
        reason: "gcd(48, 18) = 6",
    },
];

pub const KAT: &[vyre_spec::KatVector] = &[
    vyre_spec::KatVector {
        input: &[0x30, 0x00, 0x00, 0x00, 0x12, 0x00, 0x00, 0x00],
        expected: &[0x06, 0x00, 0x00, 0x00],
        source: "gcd(48, 18) = 6",
    },
    vyre_spec::KatVector {
        input: &[0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00],
        expected: &[0x05, 0x00, 0x00, 0x00],
        source: "gcd(0, 5) = 5 (identity)",
    },
    vyre_spec::KatVector {
        input: &[0x11, 0x00, 0x00, 0x00, 0x0D, 0x00, 0x00, 0x00],
        expected: &[0x01, 0x00, 0x00, 0x00],
        source: "gcd(17, 13) = 1 (coprime)",
    },
    vyre_spec::KatVector {
        input: &[0xFF, 0xFF, 0xFF, 0xFF, 0x01, 0x00, 0x00, 0x00],
        expected: &[0x01, 0x00, 0x00, 0x00],
        source: "gcd(u32::MAX, 1) = 1",
    },
];

pub const ADVERSARIAL: &[vyre_spec::AdversarialInput] = &[
    vyre_spec::AdversarialInput {
        input: &[],
        reason: "empty input exercises validation and boundary handling",
    },
];

#[cfg(test)]
mod proptests {
    #[test]
    fn coverage_artifacts_are_registered() {
        assert!(!super::KAT.is_empty());
        assert!(!super::ADVERSARIAL.is_empty());
    }
}
```

## Step 8: Verify

Regenerate the conform discovery manifest so the suite sees your new file:

```bash
cd conform/codegen && cargo run --offline -- regenerate
```

Now run the conformance tests:

```bash
cargo test -p vyre-conform --lib
```

On a clean pass you will see output similar to:

```
running 3 tests
test primitive::math::gcd::proptests::coverage_artifacts_are_registered ... ok
test golden_samples_cpu_reference ... ok
test registry_consistency ... ok
```

If something fails, the output is actionable. For example, a law failure looks like:

```
FAIL primitive.math.gcd
  generator: Commutativity
  input: (12345, 67890)
  gpu: [0x03, 0x00, 0x00, 0x00]
  cpu: [0x09, 0x00, 0x00, 0x00]
  message: law violation or byte mismatch
```

Fix the `cpu_fn` or the IR `program()` until the bytes match exactly.

## Common gotchas

- **Stubbed cpu_fn.** A `todo!()` or `panic!()` in the CPU reference will crash the conformance suite. Implement the real algorithm.
- **Wrong byte order.** vyre uses little-endian for all scalar types. Always use `u32::from_le_bytes` and `to_le_bytes`.
- **Over-claiming laws.** If you declare `Associative` but `gcd(a, gcd(b, c))` does not equal `gcd(gcd(a, b), c)` for your implementation, the suite will catch it. Only declare laws you have verified.
- **Forgetting the empty-input case.** Every binary op must return `vec![0; 4]` when `input.len() < 8`. The conformance suite sends empty inputs to verify this guard clause.

## What NOT to do

- **Do not edit `mod.rs` files.** `automod` discovers ops in `core`; the conform codegen discovers specs in `conform`. Manual `mod` declarations are legacy.
- **Do not edit central registry lists by hand.** Run `conform/codegen` instead.
- **Do not use `panic!`, `todo!`, or `unimplemented!`.** At internet scale, a panicking op is a denial-of-service vector. Return defined outputs for all inputs.
- **Do not write the conform spec without the core op, or vice versa.** They are a matched pair. The `conform_ids_match_vyre_ids` test enforces this.