# Contributing to Tokitai Operator
Thank you for your interest in contributing. This document explains the
process for adding a new operator, the architecture you need to understand
before you start, and the gates your PR must pass.
## Quick links
- `ARCHITECTURE.md` — module map, the IR / planner / executor split, and
the role of the support matrix.
- `ARTIFACT.md` — what the paper claim depends on; what counts as a
release-gate violation.
- `CLAIMS.md` — the 8 `*_claim_allowed` flags. Do not edit the flags
without a P-number trace entry.
- `docs/operators.md` — the public operator surface index.
- `src/op/registry.rs` — the registry where every op must be listed.
- `tests/` — one file per surface, one test per contract.
- `todo.json` — the live P-numbered roadmap.
## The 7-step process for adding a new op
The codebase is built around a uniform pipeline. Whether you are adding a
pointwise arithmetic op (like `SubOp`), a shape op (`ReshapeOp`), an NN op
(`GeluOp`), an index op (`GatherOp`), or a reduction (`ReduceOp`), the
sequence is the same.
### Step 1 — Define the op struct in `src/op/<family>.rs`
Pick the existing family that matches your op:
- `arithmetic.rs` for elementwise and reduction-like ops that consume
one or two tensors and return one tensor of the same shape (or scalar).
- `shape.rs` for ops that change the tensor shape (reshape, transpose,
slice, concat, broadcast, flatten, squeeze, unsqueeze, permute).
- `nn.rs` for activation, normalization, and softmax-family ops.
- `index.rs` for gather, scatter, index-select, nonzero, and related
gather/scatter patterns.
- `reductions.rs` for sum/mean/min/max/argmin/argmax.
If none of these fit, open a new file under `src/op/` and re-export it
from `src/op/mod.rs`. Existing template ops to copy:
| arithmetic | `AddOp` | ~30 |
| shape | `ReshapeOp` | ~50 |
| nn | `ReluOp` | ~35 |
| index | `GatherOp` | ~60 |
| reductions | `ReduceOp` | ~30 |
The struct itself is typically a unit struct (`pub struct SubOp;`); some
ops carry config (e.g. `axis: i64`) — see the existing ones for the
convention.
### Step 2 — Implement `Operator` (signature / infer / contracts)
Every op must implement the `Operator` trait from `src/op/mod.rs`. The
required surface is:
```rust
impl Operator for SubOp {
fn name(&self) -> &'static str { "sub" }
fn signature(&self) -> OpSignature { ... }
fn infer(&self, inputs: &[ObjectMeta]) -> Result<Vec<ObjectMeta>> { ... }
fn required_contracts(&self) -> ContractSet { ... }
fn provided_contracts(&self) -> ContractSet { ... }
fn layer_behavior(&self) -> Vec<LayerBehavior> { ... }
}
```
Conventions:
- `name` must match the `match` arm string in `src/backend/cpu.rs`. If
they differ, the op silently never executes. Use the same name
everywhere.
- `signature` declares input and output names. `tensor_input_check` is
the standard way to enforce input arity in `infer`.
- `infer` is called by `SemanticGraph::add_op` to compute the output
shape. It runs at plan time (no tensor data), so it must not read the
data vector — only the meta (domain, shape, representation).
- `required_contracts` / `provided_contracts` are the theorem-side
contract annotations. For a new op, starting with `ContractSet::new()`
on both sides is fine; tighten them as you go.
- `layer_behavior` declares whether the op is `Pointwise`, `Global`,
`Local`, or `Reductive`. The planner uses this to decide whether the
op is a candidate for fusing.
### Step 3 — Implement the CPU helper in `src/backend/cpu.rs`
Add a public function `dense_<op>_i64` (or `shape_<op>_i64` /
`nn_<op>_i64` / `index_<op>_i64`) near the related helpers. The
function takes `&Tensor<i64>` inputs and returns `Result<Tensor<i64>>`.
Required properties:
- **Copy the data.** Never share the input buffer with the output; the
caller will not expect a re-shape to alias the input. See
`shape_reshape_i64` for the canonical pattern.
- **Validate dimensions** (rank, axis range, element count). Use
`Error::backend(...)` for shape/range errors and
`Error::verification(...)` for contract violations.
- **Preserve the input domain.** Copy `inputs[0].domain` into
`output.domain` unless the op explicitly changes the domain.
- **Document the op's semantics** in the doc comment above the
function. State whether the op is associative, commutative, and
whether it can be lowered into a fused kernel.
### Step 4 — Add 2 match arms in `src/backend/cpu.rs`
Add one arm in `execute_i64_node` and one in the imperative plan
executor. Both arms have the same `node.op_name.as_str()` discriminator
and the same `dense_<op>_i64` / `shape_<op>_i64` call.
```rust
"sub" => {
let lhs = store.get(node.inputs[0])?.clone();
let rhs = store.get(node.inputs[1])?.clone();
let output = dense_sub_i64(&lhs, &rhs)?;
store.insert(node.output_ids[0], output);
}
```
If your op needs the meta of the output (for matmul and friends), pass
`node.outputs[0].clone()` as the third argument.
### Step 5 — Register the op in `src/op/registry.rs`
Two entries per op:
1. An `OperatorEntry` (or a per-family helper like
`binary_pointwise_entry` / `scalar_binary_entry`) that exposes the
`OpSignature` and contracts to the registry walker.
2. A `LoweringRule` with `required_evidence` and `obligations` strings
that document what must be true at lowering time (e.g.
"matmul inputs must be rank-2 tensors with matching inner
dimensions").
The pattern is uniform across families; copy the closest existing op's
registry entry and adjust the names and obligations. If your op's
preconditions are exotic, add a new `*_entry` helper and document the
motivation in a comment.
### Step 6 — Add a facade builder in `src/facade.rs`
Add a `pub fn <op>(&self) -> <OpType>` method to `Tokitai`. The builder
is intentionally thin — it returns the op struct so the caller can use
it in `add_op` chains. See the existing `add`, `mul`, `reshape`,
`relu`, `gather` builders for the pattern.
### Step 7 — Add tests in `tests/<family>_ops.rs`
Each new op needs at least one test. The convention is:
- `tests/arithmetic_ops.rs` for arithmetic ops.
- `tests/shape_ops.rs` for shape ops.
- `tests/nn_ops.rs` for NN ops.
- `tests/index_ops.rs` for index ops.
- `tests/reductions.rs` for reductions.
If you add a new family file, register it in
`src/verify/support_matrix.rs::tests_for()` so the support matrix
references it.
Each test should:
1. Build a small `SemanticGraph` through the public facade.
2. Plan it with `HeuristicPlanner`.
3. Execute it with `CpuScalarBackend::execute_i64`.
4. Assert the output data is what you expect.
A worked example (sub):
```rust
#[test]
fn sub_wraps_and_preserves_shape() {
let mut g = SemanticGraph::new();
let lhs = g.add_input(int_meta(vec![3]));
let rhs = g.add_input(int_meta(vec![3]));
let out = g.add_op(SubOp, &[lhs, rhs]).unwrap();
let mut store = TensorStore::new();
store.insert(lhs, Tensor::dense_cpu(DomainId::new("integer"), Shape::from(vec![3]), vec![10, 0, i64::MIN]));
store.insert(rhs, Tensor::dense_cpu(DomainId::new("integer"), Shape::from(vec![3]), vec![3, 1, 1]));
let cpu = CpuScalarBackend;
let plan = HeuristicPlanner::new(cpu.capabilities()).plan(&g).unwrap();
cpu.execute_i64(&g, &plan, &mut store).unwrap();
assert_eq!(store.get(out[0]).unwrap().data, vec![7, -1, i64::MIN.wrapping_sub(1)]);
}
```
## Worked example: adding `FooOp` end to end
Let's trace adding a hypothetical `FooOp` (multiplies a tensor by a
sigmoid-shaped polynomial of the input).
1. **Struct**: in `src/op/arithmetic.rs`, `pub struct FooOp;` plus the
`Operator` impl with name `"foo"`.
2. **CPU helper**: in `src/backend/cpu.rs`, `pub fn dense_foo_i64(input:
&Tensor<i64>) -> Result<Tensor<i64>>`. Copy `dense_*_i64` patterns.
3. **Match arms**: 2 arms in `src/backend/cpu.rs` (`execute_i64_node`
and the imperative executor).
4. **Registry**: one `OperatorEntry` and one `LoweringRule` in
`src/op/registry.rs`. Use `binary_pointwise_entry` (or
`unary_pointwise_entry` for unary) as a starting point.
5. **Facade**: in `src/facade.rs`,
`pub fn foo(&self) -> crate::op::arithmetic::FooOp { ... }`.
6. **Tests**: at least one test in `tests/arithmetic_ops.rs`. Pattern as
in the sub example above.
7. **Support matrix**: add the new test file to
`src/verify/support_matrix.rs::tests_for()` for the new lowering
rule.
## PR checklist
Before opening a PR, confirm:
- [ ] `cargo build --offline` is clean (no new warnings).
- [ ] `cargo fmt --check` is clean.
- [ ] `cargo test --offline` passes (the 4 pre-existing
`tests/paper_artifacts.rs` failures are tracked separately; do not
weaken them).
- [ ] `cargo test --offline --test support_matrix_coverage` passes
(this is the "no orphan evidence" guard for your new op).
- [ ] `cargo test --offline --test <your_new_test_file>` passes.
- [ ] `tests/witness_proof_replay.rs` still passes (the proof-replay
witness section is generated from the registry; your new
`LoweringRule` does not need to be referenced there).
- [ ] `CLAIMS.md` is unchanged. You must not edit the
`*_claim_allowed` flags.
- [ ] `src/verify/{release_gate,semantic_conformance,hip_audit,theorems,claim_status}.rs`
and `docs/paper/*` are unchanged. `src/verify/support_matrix.rs` is
the one exception — adding a new row helper is the standard way to
register evidence.
- [ ] Your new op has a P-number trace entry. If you are doing the work
in scope of a `todo.json` phase, the P-number is already there. If
you are starting a new task, add a phase to `todo.json` first.
- [ ] If your op changes the post-P353 surface (P335-P348), the support
matrix has a new row referencing your test file.
## The 5-step process for adding a new mathematical domain
A "domain" in Tokitai is the mathematical structure an operator is
allowed to operate over (e.g. `integer`, `padic:Q_5`, `finite_field:F_5`).
Adding a new domain is rarer than adding a new op, but follows a
similar pipeline. The 0.7B MoE training project added 2 domains
(`integer`, `finite_field`) following the steps below.
### Step 1 — Define the abstract types in `src/domain/<name>.rs`
The domain module tree (`src/domain/`) holds the abstract contract
primitives:
- `src/domain/contract.rs` — `Contract`, `ContractId`, `ContractSet`,
`Claim`, `Condition`, `Evidence`, `Scope` (read by the op layer).
- `src/domain/structure.rs` — `DomainKind` (`Boolean`, `Integer`,
`Field`, ...). Add a new variant if your domain is a new kind.
- `src/domain/precision.rs` — `PrecisionModel` (`Exact`,
`FixedDigits`, ...). Add a new variant if your domain has its own
precision story.
- `src/domain/valued.rs` — `ValuationBound` (the p-adic valuation
bound). Add a new variant if your domain has a valuation.
For most new domains you only need to touch `structure.rs` (add a
`DomainKind` variant) and one of the others. Do not edit
`src/domain/mod.rs` unless you are adding a new submodule.
### Step 2 — Define the concrete domain in `src/domain/<name>.rs`
Create a new file (e.g. `src/domain/padic_domain.rs`) that defines:
- `pub struct <Name>Domain` (carries the per-domain config: precision
cutoff, valuation bound, etc.).
- `impl Domain for <Name>Domain` (the `kind()`, `id()`, and
`contracts()` methods).
- The value type (`pub struct <Name> { ... }`).
The existing `PadicDomain` (in `src/domain/padic.rs`) and
`FiniteFieldDomain` (in `src/domain/finite_field.rs`) are the two
canonical examples. The 5-step shape is the same; only the per-domain
config and the value type differ.
### Step 3 — Register the domain in the planner
The planner reads the domain's contract set when discharging
obligations. To make the planner aware of the new domain:
- Add the domain's `id()` value to the `domains` set in
`src/op/registry.rs::cpu_scalar_builtins()` (or the equivalent
builder if you are adding a non-cpu domain).
- If the new domain has a custom contract, add a `Contract` to the
`ContractSet` returned by `<Name>Domain::contracts()`.
### Step 4 — Add a `tests/<name>.rs` integration test
The integration test follows the standard pattern: build a small
`SemanticGraph` with a single op on a tensor of the new domain,
plan it, execute it, and assert the output. The `tests/finite_field.rs`
test is a good template.
### Step 5 — Update the support matrix
The support matrix (`src/verify/support_matrix.rs`) auto-extends
when a new domain is added: the registry-walk loop in
`generated_theory_support_matrix()` produces a new row for each
(domain, operator, backend) triple that the new domain participates
in. No manual edit is needed. The
`tests/support_matrix_coverage.rs` test will fail-closed if the
new row is missing a test file or public API path.
### Domain-extension checklist
- [ ] `src/domain/<name>.rs` defines the domain and value type.
- [ ] `src/domain/contract.rs` is updated (only if the new domain
introduces a new contract primitive).
- [ ] `src/op/registry.rs::cpu_scalar_builtins()` lists the new
domain's `id()`.
- [ ] `tests/<name>.rs` exercises the new domain end-to-end.
- [ ] `bash scripts/health_check.sh` passes.
- [ ] `cargo test --offline` passes.
## Architecture primer
A 60-second tour of the code paths your new op will touch:
- **Operator** (`src/op/<family>.rs`) — the op struct and its
`Operator` impl. The op knows its name, signature, and how to infer
output shape from input shape.
- **IR** (`src/ir/semantic.rs`) — `SemanticGraph` is a DAG of op
invocations. `add_input` returns value ids; `add_op` records a node
and calls `op.infer` to compute the output meta.
- **Planner** (`src/planner/mod.rs`) — `HeuristicPlanner::plan(graph)`
walks the graph and produces an `ExecutionPlan` with a backend name
and a per-node decision (lowering rule id, contract checks).
- **Executor** (`src/backend/cpu.rs`) — `CpuScalarBackend::execute_i64`
walks the plan and dispatches per op name. The match arms are the
truth of "what runs on cpu_scalar".
- **Registry** (`src/op/registry.rs`) — the source of truth for what
the planner is allowed to lower. Each rule has a `required_evidence`
list and an `obligations` list.
- **Support matrix** (`src/verify/support_matrix.rs`) — the public
surface index. One row per (domain, operator, backend) triple.
Updated by P355, audited by P359's coverage walk test.
## Questions?
Open an issue. Tag the maintainers. If your question is about a
specific op, link the op's test file (e.g. `tests/arithmetic_ops.rs`)
and the registry entry.