tokitai-operator 0.1.0

Verified DL kernel compiler: formally-checked GEMM, p-adic, sheaf, contract-carrying ops. Paper-artifact grade.
Documentation
# Theory-Aware Support Matrix

P205 adds a generated support matrix for the current theory-engineered runtime
boundary.

The public generator is `generated_theory_support_matrix`. It builds rows from:

- CPU lowering metadata in `OperatorRegistry::cpu_scalar_builtins`
- backend capability metadata for `cpu_scalar` and `gpu_scaffold`
- theorem descriptors in `TheoremBindingRegistry::tokitai_default`
- facade exposure boundaries for p-adic sum-products and finite-site sheaf glue
- explicit GPU scaffold and feature-gated dense-i64 pilot limitations
- a local ROCm/HIP hardware contract row for device evidence before HIP kernel
  admission
- a feature-gated ROCm/HIP dense i32 add pilot row with kernel/source/compiler
  evidence and CPU-oracle equality

Every row records:

- domain and operator
- backend and backend capability summary
- lowering rule when one exists
- support status: `supported`, `fallback_only`, `unsupported`,
  `feature_gated_pilot`, `local_hardware_evidence`, or `unavailable`
- theory-law requirements and theorem package bindings
- fallback mode
- public API exposure
- tests that guard the row
- non-claims

`cargo run --quiet --offline --example paper_benchmarks` writes:

- `target/paper-results/theory-support-matrix.json`
- `target/paper-results/theory-support-matrix.md`

The matrix intentionally marks unsupported and fallback-only combinations
instead of relying on prose. GPU scaffold rows are fallback evidence and must not
be read as p-adic, finite-site sheaf, or general GPU acceleration. The
feature-gated pilot row is limited to dense integer add with CPU-oracle
guardrails.

The ROCm/HIP hardware row records local device/toolchain evidence only. It is
not a HIP kernel execution claim, portable AMD GPU support, nonstandard-domain
acceleration, or production performance evidence.

The ROCm/HIP dense i32 pilot row is narrower: it records one real HIP add
kernel behind `--features rocm-hip`, guarded by CPU-oracle output comparison.
It is still not broad GPU support or nonstandard-domain acceleration.
The matching unavailable row records the fail-closed path for missing ROCm,
wrong gfx target, stale/missing kernel or compiler fingerprints, and missing
transfer obligations.

The ROCm/HIP p-adic valuation row is a fixed-precision helper only. It binds to
the valuation-skip theorem boundary for the cutoff fact and remains guarded by
`PadicDomain::valuation_of` as the CPU oracle.

The ROCm/HIP finite-site sheaf locality row covers only overlap equality for
integer section values. `SectionTable::compatibility_report` remains the oracle
for obstruction provenance, and boundary cases fall back without executing HIP.

## P335-P355 Surface Extension

P355 extends the matrix with the P335-P348 operator surface. Every
new op appears in `target/paper-results/theory-support-matrix.md`
with its lowering rule, support status, and the dedicated test file
that guards the row:

- **P335 arithmetic** (8 ops: SubOp, DivOp, ScalarAddOp, ScalarMulOp,
  PowOp, SqrtOp, Exp2Op, Log2Op) — guarded by `tests/arithmetic_ops.rs`
- **P336 shape** (9 ops: ReshapeOp, TransposeOp, PermuteOp, SliceOp,
  ConcatOp, BroadcastOp, FlattenOp, SqueezeOp, UnsqueezeOp) — guarded
  by `tests/shape_ops.rs`
- **P337 NN/activation** (6 ops: ReluOp, SigmoidOp, TanhOp, GELUOp,
  SoftmaxOp, LayerNormOp) — guarded by `tests/nn_ops.rs`
- **P338 index** (5 ops: GatherOp, ScatterOp, IndexSelectOp,
  IndexAddOp, NonzeroOp) — guarded by `tests/index_ops.rs`
- **P339 reduction** (9 ops: SumOp, MeanOp, MaxOp, MinOp, ArgMaxOp,
  ArgMinOp, ProdOp, AnyOp, AllOp) — guarded by `tests/reductions.rs`
- **P346 finite-field domain** (FiniteFieldDomain / F_{p^k}) —
  guarded by `tests/finite_field.rs` (12 tests + 2 doctests)
- **P347 valuation witnesses** (ValuationAdditivityWitness,
  UltrametricWitness, PrecisionPreservationWitness) — guarded by
  `tests/padic_witnesses.rs` (9 tests)
- **P348 cover-glue inference** (CoverGlueInferenceReport,
  InferenceAttempt) — guarded by `tests/cover_glue_inference.rs`
  (7 tests)

These rows are CPU-reference evidence only. They are not production
speedup claims, portable GPU support, full formal proofs of the
ultrametric inequality, or general sheaf cohomology. Each row
records a non-claim that bounds the claim language.