tokitai-operator 0.1.0

Verified DL kernel compiler: formally-checked GEMM, p-adic, sheaf, contract-carrying ops. Paper-artifact grade.
Documentation
# Contributing to Tokitai Operator

Thank you for your interest in contributing. This document explains the
process for adding a new operator, the architecture you need to understand
before you start, and the gates your PR must pass.

## Quick links

- `ARCHITECTURE.md` — module map, the IR / planner / executor split, and
  the role of the support matrix.
- `ARTIFACT.md` — what the paper claim depends on; what counts as a
  release-gate violation.
- `CLAIMS.md` — the 8 `*_claim_allowed` flags. Do not edit the flags
  without a P-number trace entry.
- `docs/operators.md` — the public operator surface index.
- `src/op/registry.rs` — the registry where every op must be listed.
- `tests/` — one file per surface, one test per contract.
- `todo.json` — the live P-numbered roadmap.

## The 7-step process for adding a new op

The codebase is built around a uniform pipeline. Whether you are adding a
pointwise arithmetic op (like `SubOp`), a shape op (`ReshapeOp`), an NN op
(`GeluOp`), an index op (`GatherOp`), or a reduction (`ReduceOp`), the
sequence is the same.

### Step 1 — Define the op struct in `src/op/<family>.rs`

Pick the existing family that matches your op:

- `arithmetic.rs` for elementwise and reduction-like ops that consume
  one or two tensors and return one tensor of the same shape (or scalar).
- `shape.rs` for ops that change the tensor shape (reshape, transpose,
  slice, concat, broadcast, flatten, squeeze, unsqueeze, permute).
- `nn.rs` for activation, normalization, and softmax-family ops.
- `index.rs` for gather, scatter, index-select, nonzero, and related
  gather/scatter patterns.
- `reductions.rs` for sum/mean/min/max/argmin/argmax.

If none of these fit, open a new file under `src/op/` and re-export it
from `src/op/mod.rs`. Existing template ops to copy:

| Family      | Template op           | Lines |
|-------------|-----------------------|-------|
| arithmetic  | `AddOp`               | ~30   |
| shape       | `ReshapeOp`           | ~50   |
| nn          | `ReluOp`              | ~35   |
| index       | `GatherOp`            | ~60   |
| reductions  | `ReduceOp`            | ~30   |

The struct itself is typically a unit struct (`pub struct SubOp;`); some
ops carry config (e.g. `axis: i64`) — see the existing ones for the
convention.

### Step 2 — Implement `Operator` (signature / infer / contracts)

Every op must implement the `Operator` trait from `src/op/mod.rs`. The
required surface is:

```rust
impl Operator for SubOp {
    fn name(&self) -> &'static str { "sub" }
    fn signature(&self) -> OpSignature { ... }
    fn infer(&self, inputs: &[ObjectMeta]) -> Result<Vec<ObjectMeta>> { ... }
    fn required_contracts(&self) -> ContractSet { ... }
    fn provided_contracts(&self) -> ContractSet { ... }
    fn layer_behavior(&self) -> Vec<LayerBehavior> { ... }
}
```

Conventions:

- `name` must match the `match` arm string in `src/backend/cpu.rs`. If
  they differ, the op silently never executes. Use the same name
  everywhere.
- `signature` declares input and output names. `tensor_input_check` is
  the standard way to enforce input arity in `infer`.
- `infer` is called by `SemanticGraph::add_op` to compute the output
  shape. It runs at plan time (no tensor data), so it must not read the
  data vector — only the meta (domain, shape, representation).
- `required_contracts` / `provided_contracts` are the theorem-side
  contract annotations. For a new op, starting with `ContractSet::new()`
  on both sides is fine; tighten them as you go.
- `layer_behavior` declares whether the op is `Pointwise`, `Global`,
  `Local`, or `Reductive`. The planner uses this to decide whether the
  op is a candidate for fusing.

### Step 3 — Implement the CPU helper in `src/backend/cpu.rs`

Add a public function `dense_<op>_i64` (or `shape_<op>_i64` /
`nn_<op>_i64` / `index_<op>_i64`) near the related helpers. The
function takes `&Tensor<i64>` inputs and returns `Result<Tensor<i64>>`.

Required properties:

- **Copy the data.** Never share the input buffer with the output; the
  caller will not expect a re-shape to alias the input. See
  `shape_reshape_i64` for the canonical pattern.
- **Validate dimensions** (rank, axis range, element count). Use
  `Error::backend(...)` for shape/range errors and
  `Error::verification(...)` for contract violations.
- **Preserve the input domain.** Copy `inputs[0].domain` into
  `output.domain` unless the op explicitly changes the domain.
- **Document the op's semantics** in the doc comment above the
  function. State whether the op is associative, commutative, and
  whether it can be lowered into a fused kernel.

### Step 4 — Add 2 match arms in `src/backend/cpu.rs`

Add one arm in `execute_i64_node` and one in the imperative plan
executor. Both arms have the same `node.op_name.as_str()` discriminator
and the same `dense_<op>_i64` / `shape_<op>_i64` call.

```rust
"sub" => {
    let lhs = store.get(node.inputs[0])?.clone();
    let rhs = store.get(node.inputs[1])?.clone();
    let output = dense_sub_i64(&lhs, &rhs)?;
    store.insert(node.output_ids[0], output);
}
```

If your op needs the meta of the output (for matmul and friends), pass
`node.outputs[0].clone()` as the third argument.

### Step 5 — Register the op in `src/op/registry.rs`

Two entries per op:

1. An `OperatorEntry` (or a per-family helper like
   `binary_pointwise_entry` / `scalar_binary_entry`) that exposes the
   `OpSignature` and contracts to the registry walker.
2. A `LoweringRule` with `required_evidence` and `obligations` strings
   that document what must be true at lowering time (e.g.
   "matmul inputs must be rank-2 tensors with matching inner
   dimensions").

The pattern is uniform across families; copy the closest existing op's
registry entry and adjust the names and obligations. If your op's
preconditions are exotic, add a new `*_entry` helper and document the
motivation in a comment.

### Step 6 — Add a facade builder in `src/facade.rs`

Add a `pub fn <op>(&self) -> <OpType>` method to `Tokitai`. The builder
is intentionally thin — it returns the op struct so the caller can use
it in `add_op` chains. See the existing `add`, `mul`, `reshape`,
`relu`, `gather` builders for the pattern.

### Step 7 — Add tests in `tests/<family>_ops.rs`

Each new op needs at least one test. The convention is:

- `tests/arithmetic_ops.rs` for arithmetic ops.
- `tests/shape_ops.rs` for shape ops.
- `tests/nn_ops.rs` for NN ops.
- `tests/index_ops.rs` for index ops.
- `tests/reductions.rs` for reductions.

If you add a new family file, register it in
`src/verify/support_matrix.rs::tests_for()` so the support matrix
references it.

Each test should:

1. Build a small `SemanticGraph` through the public facade.
2. Plan it with `HeuristicPlanner`.
3. Execute it with `CpuScalarBackend::execute_i64`.
4. Assert the output data is what you expect.

A worked example (sub):

```rust
#[test]
fn sub_wraps_and_preserves_shape() {
    let mut g = SemanticGraph::new();
    let lhs = g.add_input(int_meta(vec![3]));
    let rhs = g.add_input(int_meta(vec![3]));
    let out = g.add_op(SubOp, &[lhs, rhs]).unwrap();

    let mut store = TensorStore::new();
    store.insert(lhs, Tensor::dense_cpu(DomainId::new("integer"), Shape::from(vec![3]), vec![10, 0, i64::MIN]));
    store.insert(rhs, Tensor::dense_cpu(DomainId::new("integer"), Shape::from(vec![3]), vec![3, 1, 1]));

    let cpu = CpuScalarBackend;
    let plan = HeuristicPlanner::new(cpu.capabilities()).plan(&g).unwrap();
    cpu.execute_i64(&g, &plan, &mut store).unwrap();
    assert_eq!(store.get(out[0]).unwrap().data, vec![7, -1, i64::MIN.wrapping_sub(1)]);
}
```

## Worked example: adding `FooOp` end to end

Let's trace adding a hypothetical `FooOp` (multiplies a tensor by a
sigmoid-shaped polynomial of the input).

1. **Struct**: in `src/op/arithmetic.rs`, `pub struct FooOp;` plus the
   `Operator` impl with name `"foo"`.
2. **CPU helper**: in `src/backend/cpu.rs`, `pub fn dense_foo_i64(input:
   &Tensor<i64>) -> Result<Tensor<i64>>`. Copy `dense_*_i64` patterns.
3. **Match arms**: 2 arms in `src/backend/cpu.rs` (`execute_i64_node`
   and the imperative executor).
4. **Registry**: one `OperatorEntry` and one `LoweringRule` in
   `src/op/registry.rs`. Use `binary_pointwise_entry` (or
   `unary_pointwise_entry` for unary) as a starting point.
5. **Facade**: in `src/facade.rs`,
   `pub fn foo(&self) -> crate::op::arithmetic::FooOp { ... }`.
6. **Tests**: at least one test in `tests/arithmetic_ops.rs`. Pattern as
   in the sub example above.
7. **Support matrix**: add the new test file to
   `src/verify/support_matrix.rs::tests_for()` for the new lowering
   rule.

## PR checklist

Before opening a PR, confirm:

- [ ] `cargo build --offline` is clean (no new warnings).
- [ ] `cargo fmt --check` is clean.
- [ ] `cargo test --offline` passes (the 4 pre-existing
  `tests/paper_artifacts.rs` failures are tracked separately; do not
  weaken them).
- [ ] `cargo test --offline --test support_matrix_coverage` passes
  (this is the "no orphan evidence" guard for your new op).
- [ ] `cargo test --offline --test <your_new_test_file>` passes.
- [ ] `tests/witness_proof_replay.rs` still passes (the proof-replay
  witness section is generated from the registry; your new
  `LoweringRule` does not need to be referenced there).
- [ ] `CLAIMS.md` is unchanged. You must not edit the
  `*_claim_allowed` flags.
- [ ] `src/verify/{release_gate,semantic_conformance,hip_audit,theorems,claim_status}.rs`
  and `docs/paper/*` are unchanged. `src/verify/support_matrix.rs` is
  the one exception — adding a new row helper is the standard way to
  register evidence.
- [ ] Your new op has a P-number trace entry. If you are doing the work
  in scope of a `todo.json` phase, the P-number is already there. If
  you are starting a new task, add a phase to `todo.json` first.
- [ ] If your op changes the post-P353 surface (P335-P348), the support
  matrix has a new row referencing your test file.

## The 5-step process for adding a new mathematical domain

A "domain" in Tokitai is the mathematical structure an operator is
allowed to operate over (e.g. `integer`, `padic:Q_5`, `finite_field:F_5`).
Adding a new domain is rarer than adding a new op, but follows a
similar pipeline. The 0.7B MoE training project added 2 domains
(`integer`, `finite_field`) following the steps below.

### Step 1 — Define the abstract types in `src/domain/<name>.rs`

The domain module tree (`src/domain/`) holds the abstract contract
primitives:

- `src/domain/contract.rs``Contract`, `ContractId`, `ContractSet`,
  `Claim`, `Condition`, `Evidence`, `Scope` (read by the op layer).
- `src/domain/structure.rs``DomainKind` (`Boolean`, `Integer`,
  `Field`, ...). Add a new variant if your domain is a new kind.
- `src/domain/precision.rs``PrecisionModel` (`Exact`,
  `FixedDigits`, ...). Add a new variant if your domain has its own
  precision story.
- `src/domain/valued.rs``ValuationBound` (the p-adic valuation
  bound). Add a new variant if your domain has a valuation.

For most new domains you only need to touch `structure.rs` (add a
`DomainKind` variant) and one of the others. Do not edit
`src/domain/mod.rs` unless you are adding a new submodule.

### Step 2 — Define the concrete domain in `src/domain/<name>.rs`

Create a new file (e.g. `src/domain/padic_domain.rs`) that defines:

- `pub struct <Name>Domain` (carries the per-domain config: precision
  cutoff, valuation bound, etc.).
- `impl Domain for <Name>Domain` (the `kind()`, `id()`, and
  `contracts()` methods).
- The value type (`pub struct <Name> { ... }`).

The existing `PadicDomain` (in `src/domain/padic.rs`) and
`FiniteFieldDomain` (in `src/domain/finite_field.rs`) are the two
canonical examples. The 5-step shape is the same; only the per-domain
config and the value type differ.

### Step 3 — Register the domain in the planner

The planner reads the domain's contract set when discharging
obligations. To make the planner aware of the new domain:

- Add the domain's `id()` value to the `domains` set in
  `src/op/registry.rs::cpu_scalar_builtins()` (or the equivalent
  builder if you are adding a non-cpu domain).
- If the new domain has a custom contract, add a `Contract` to the
  `ContractSet` returned by `<Name>Domain::contracts()`.

### Step 4 — Add a `tests/<name>.rs` integration test

The integration test follows the standard pattern: build a small
`SemanticGraph` with a single op on a tensor of the new domain,
plan it, execute it, and assert the output. The `tests/finite_field.rs`
test is a good template.

### Step 5 — Update the support matrix

The support matrix (`src/verify/support_matrix.rs`) auto-extends
when a new domain is added: the registry-walk loop in
`generated_theory_support_matrix()` produces a new row for each
(domain, operator, backend) triple that the new domain participates
in. No manual edit is needed. The
`tests/support_matrix_coverage.rs` test will fail-closed if the
new row is missing a test file or public API path.

### Domain-extension checklist

- [ ] `src/domain/<name>.rs` defines the domain and value type.
- [ ] `src/domain/contract.rs` is updated (only if the new domain
  introduces a new contract primitive).
- [ ] `src/op/registry.rs::cpu_scalar_builtins()` lists the new
  domain's `id()`.
- [ ] `tests/<name>.rs` exercises the new domain end-to-end.
- [ ] `bash scripts/health_check.sh` passes.
- [ ] `cargo test --offline` passes.

## Architecture primer

A 60-second tour of the code paths your new op will touch:

- **Operator** (`src/op/<family>.rs`) — the op struct and its
  `Operator` impl. The op knows its name, signature, and how to infer
  output shape from input shape.
- **IR** (`src/ir/semantic.rs`) — `SemanticGraph` is a DAG of op
  invocations. `add_input` returns value ids; `add_op` records a node
  and calls `op.infer` to compute the output meta.
- **Planner** (`src/planner/mod.rs`) — `HeuristicPlanner::plan(graph)`
  walks the graph and produces an `ExecutionPlan` with a backend name
  and a per-node decision (lowering rule id, contract checks).
- **Executor** (`src/backend/cpu.rs`) — `CpuScalarBackend::execute_i64`
  walks the plan and dispatches per op name. The match arms are the
  truth of "what runs on cpu_scalar".
- **Registry** (`src/op/registry.rs`) — the source of truth for what
  the planner is allowed to lower. Each rule has a `required_evidence`
  list and an `obligations` list.
- **Support matrix** (`src/verify/support_matrix.rs`) — the public
  surface index. One row per (domain, operator, backend) triple.
  Updated by P355, audited by P359's coverage walk test.

## Questions?

Open an issue. Tag the maintainers. If your question is about a
specific op, link the op's test file (e.g. `tests/arithmetic_ops.rs`)
and the registry entry.