tokitai-operator 0.1.0

Verified DL kernel compiler: formally-checked GEMM, p-adic, sheaf, contract-carrying ops. Paper-artifact grade.
Documentation
# Operator Surface Guide

This document is the public surface index for the Tokitai operator
library. Each section lists the op structs, the facade builders, the
lowering rules, the test file, and a one-line description.

For the architecture overview and the IR / planner / executor split,
see `ARCHITECTURE.md`. For the P-numbered roadmap and the post-P353
surface, see `CHANGELOG.md`. For the support matrix (the auditable
surface index), see `src/verify/support_matrix.rs` and
`docs/theory_support_matrix.md`.

## Arithmetic (P335)

Elementwise and reduction-like binary/scalar/unary ops on integer or
floating-point tensors. All ops route through the
`CpuScalarBackend::execute_i64` matcher.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `AddOp` | `tokitai.add()` | `tests/arithmetic_ops.rs` | elementwise `lhs + rhs` |
| `MulOp` | `tokitai.mul()` | `tests/arithmetic_ops.rs` | elementwise `lhs * rhs` |
| `SubOp` | `tokitai.sub()` | `tests/arithmetic_ops.rs` | elementwise `lhs - rhs` (wraps on `i64::MIN`) |
| `DivOp` | `tokitai.div()` | `tests/arithmetic_ops.rs` | elementwise `lhs / rhs`; zero divisor returns `Error::Backend` |
| `ScalarAddOp` | `tokitai.scalar_add()` | `tests/arithmetic_ops.rs` | elementwise `lhs + scalar` |
| `ScalarMulOp` | `tokitai.scalar_mul()` | `tests/arithmetic_ops.rs` | elementwise `lhs * scalar` |
| `PowOp` | `tokitai.pow()` | `tests/arithmetic_ops.rs` | elementwise `lhs ^ exp` (integer exponent) |
| `SqrtOp` | `tokitai.sqrt()` | `tests/arithmetic_ops.rs` | elementwise `floor(sqrt(lhs))` |
| `Exp2Op` | `tokitai.exp2()` | `tests/arithmetic_ops.rs` | elementwise `2 ^ lhs` |
| `Log2Op` | `tokitai.log2()` | `tests/arithmetic_ops.rs` | elementwise `floor(log2(lhs))` |
| `MapOp` | `tokitai.map()` | `tests/arithmetic_ops.rs` | placeholder elementwise identity |
| `ReduceOp` | `tokitai.reduce()` | `tests/arithmetic_ops.rs` | sum-reduce to a scalar |
| `MatmulOp` | `tokitai.matmul()` | `tests/arithmetic_ops.rs` | rank-2 matrix multiplication |
| `FmaOp` | `tokitai.fma()` | `tests/arithmetic_ops.rs` | fused multiply-add |
| `PAdicMatmulFmaOp` | `tokitai.p_pad_fma()` | `tests/padic.rs` | p-adic matmul with FMA accumulation |
| `PAdicDotOp` | `tokitai.p_dot()` | `tests/padic_dot.rs` | p-adic dot product with valuation skip |
| `ClampOp` | `tokitai.clamp()` | `tests/arithmetic_ops.rs` | elementwise `clamp(lhs, lo, hi)` |
| `NegOp` | `tokitai.neg()` | `tests/arithmetic_ops.rs` | elementwise `-lhs` |
| `AbsOp` | `tokitai.abs()` | `tests/arithmetic_ops.rs` | elementwise `|lhs|` |
| `SquareOp` | `tokitai.square()` | `tests/arithmetic_ops.rs` | elementwise `lhs * lhs` |
| `MulByTwoOp` | `tokitai.mul_by_two()` | `tests/arithmetic_ops.rs` | elementwise `lhs * 2` |

## Shape (P336)

Shape-manipulation ops. All ops are domain-preserving: the input
domain (integer, finite field, etc.) is copied to the output.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `ReshapeOp` | `tokitai.reshape()` | `tests/shape_ops.rs` | reshape to a new shape (element count must match) |
| `TransposeOp` | `tokitai.transpose()` | `tests/shape_ops.rs` | swap exactly two axes (2-element axes input) |
| `SliceOp` | `tokitai.slice()` | `tests/shape_ops.rs` | half-open `[start, end)` along one axis |
| `ConcatOp` | `tokitai.concat()` | `tests/shape_ops.rs` | concatenate multiple tensors along one axis |
| `BroadcastOp` | `tokitai.broadcast()` | `tests/shape_ops.rs` | explicit broadcast to a target shape |
| `FlattenOp` | `tokitai.flatten()` | `tests/shape_ops.rs` | flatten a tensor to 1-D (preserves element order) |
| `SqueezeOp` | `tokitai.squeeze()` | `tests/shape_ops.rs` | remove all size-1 dimensions |
| `UnsqueezeOp` | `tokitai.unsqueeze()` | `tests/shape_ops.rs` | insert a size-1 dimension at `axis` |
| `PermuteOp` | `tokitai.permute()` | `tests/shape_ops.rs` | full permutation of axes (TransposeOp is the 2-axis special case) |

## NN (P337)

Neural-network primitives: activations, normalization, softmax.
Non-linearities use fixed-point polynomial approximations for
deterministic i64 execution.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `ReluOp` | `tokitai.relu()` | `tests/nn_ops.rs` | elementwise `max(0, x)` |
| `SigmoidOp` | `tokitai.sigmoid()` | `tests/nn_ops.rs` | elementwise `1 / (1 + exp(-x))`, scaled by 1_000_000 |
| `TanhOp` | `tokitai.tanh()` | `tests/nn_ops.rs` | elementwise `tanh(x)`, scaled by 1_000_000 |
| `GeluOp` | `tokitai.gelu()` | `tests/nn_ops.rs` | elementwise GELU approximation |
| `SoftmaxOp` | `tokitai.softmax()` | `tests/nn_ops.rs` | softmax over the last axis, output scaled by 1_000_000 |
| `LayerNormOp` | `tokitai.layer_norm()` | `tests/nn_ops.rs` | per-row layer normalization (integer mean/stddev) |

## Index (P338)

Gather / scatter / index-select style ops.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `GatherOp` | `tokitai.gather()` | `tests/index_ops.rs` | gather along `axis` using `indices` |
| `ScatterOp` | `tokitai.scatter()` | `tests/index_ops.rs` | scatter into a fresh zero buffer along `axis` |
| `IndexSelectOp` | `tokitai.index_select()` | `tests/index_ops.rs` | index select (alias for gather in the current backend) |
| `IndexAddOp` | `tokitai.index_add()` | `tests/index_ops.rs` | index add: `out[i] = sum over j of (src[indices[j]] if j matches i)` |
| `NonzeroOp` | `tokitai.nonzero()` | `tests/index_ops.rs` | return the indices of non-zero entries |

## Reductions (P339)

Variants of `ReduceOp`.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `SumOp` | `tokitai.sum()` | `tests/reductions.rs` | sum-reduce to a scalar |
| `MeanOp` | `tokitai.mean()` | `tests/reductions.rs` | mean-reduce (integer floor division) |
| `MaxOp` | `tokitai.max()` | `tests/reductions.rs` | max-reduce |
| `MinOp` | `tokitai.min()` | `tests/reductions.rs` | min-reduce |
| `ArgMaxOp` | `tokitai.argmax()` | `tests/reductions.rs` | argmax along an axis |
| `ArgMinOp` | `tokitai.argmin()` | `tests/reductions.rs` | argmin along an axis |
| `ProdOp` | `tokitai.prod()` | `tests/reductions.rs` | multiplicative reduce |
| `AnyOp` | `tokitai.any()` | `tests/reductions.rs` | any-nonzero reduce |
| `AllOp` | `tokitai.all()` | `tests/reductions.rs` | all-nonzero reduce |

## p-adic

The p-adic matmul / dot / valuation-skip path is the original paper
artifact. The domain lives in `src/domain/contract.rs::Claim::PadicSpec`.

| Op struct | Facade builder | Test file | Description |
|-----------|----------------|-----------|-------------|
| `PAdicMatmulFmaOp` | `tokitai.p_pad_fma()` | `tests/padic.rs` | p-adic matmul with FMA accumulation; valuation-skip is the planner policy |
| `PAdicDotOp` | `tokitai.p_dot()` | `tests/padic_dot.rs` | p-adic dot product with valuation-skip |

The p-adic valuation witnesses (P347) live in
`src/verify/witnesses.rs` and are listed in the support matrix
under the `valuation_witnesses` operator. Exercised by
`tests/padic_witnesses.rs`.

## Finite-field (P346)

The finite-field domain `F_{p^k}` lives in
`src/domain/finite_field.rs`. It supports prime fields `F_p` and
extension fields `F_{p^k}` (polynomial-mod-irreducible
multiplication). The domain is exact by construction; no lowering
rule is registered (no `Op` in the public surface).

| Domain type | Construction | Test file | Description |
|-------------|--------------|-----------|-------------|
| `FiniteFieldDomain` | `FiniteFieldDomain::prime(p)` or `FiniteFieldDomain::extension(p, k, modulus)` | `tests/finite_field.rs` | `F_p` or `F_{p^k}` field |
| `FiniteFieldElement::Prime` | `FiniteFieldElement::Prime { residue, prime }` | `tests/finite_field.rs` | element of a prime field |
| `FiniteFieldElement::Extension` | `FiniteFieldElement::Extension { coeffs, prime }` | `tests/finite_field.rs` | element of an extension field |
| `Domain::mul`, `Domain::add`, etc. | (none) | `tests/finite_field.rs` | domain methods |

## Sheaf

The sheaf cover-glue verifier and the cover-glue inference (P348)
live in `src/object/sheaf.rs` and
`src/verify/cover_glue_inference.rs`. There is no `Op` in the
public surface for sheaf; the public API is the data types
(`Cover`, `OpenId`, `SectionTable`) and the verifier
(`cover_glue_inference_report`).

| Type / function | Path | Test file | Description |
|-----------------|------|-----------|-------------|
| `Cover` | `src/object/sheaf.rs` | `tests/sheaf.rs` | a finite cover of opens |
| `OpenId` | `src/object/sheaf.rs` | `tests/sheaf.rs` | identifier for an open |
| `SectionTable<T>` | `src/object/sheaf.rs` | `tests/sheaf.rs` | map from `OpenId` to section |
| `FiniteSite` | `src/object/sheaf.rs` | `tests/sheaf.rs` | site with inclusions and intersections |
| `cover_glue_inference_report` | `src/verify/cover_glue_inference.rs` | `tests/cover_glue_inference.rs` | report passed/failed inference attempts |
| `InferenceAttempt` | `src/verify/cover_glue_inference.rs` | `tests/cover_glue_inference.rs` | `InferredMissingSection`, `ResolvedOverlap`, `FailedRecovery` |
| `plan_cover_glue_check` | `src/planner/mod.rs` | `tests/sheaf.rs` | planner entry point for the verifier |
| `verify_sheaf_glue` | `src/facade.rs` | `tests/sheaf.rs` | public facade entry point |

## See also

- `src/op/registry.rs` — the registry that ties operators, lowering
  rules, and contract sets together. Every op must be listed here.
- `src/verify/support_matrix.rs` — the support matrix that
  cross-references every op with its test file, public API path,
  and contract. P355 added the post-P353 surface rows.
- `docs/theory_support_matrix.md` — the auto-generated support
  matrix, regenerated as part of the release-gate check.
- `tests/support_matrix_coverage.rs` (P359) — fail-closed walk
  that verifies every support-matrix row has a real test file.
- `benches/ops.rs` and `benches/finite_field.rs` (P358) — criterion
  benchmarks for the operator surface.