Expand description
TensorAxis declaration + parametric square-matmul impl + shape.
Per Wiki ADR-031 the tensor sub-crate exposes
TensorAxis as the canonical Layer-3 surface for tensor compute.
The reference impl CpuI8MatmulSquare is generic over the square
dimension DIM, with i8 inputs and saturating-i16 outputs —
the integer-arithmetic determinism contract ADR-030 names as the
axis substitution-determinism baseline.
Variable-rank tensor compute composes through verbs over
partition_product!-declared shapes per ADR-033/044; the axis’s
role is the fixed-shape atomic primitive.
§ADR-055 substrate-Term verb body discipline
Per Wiki ADR-055
every AxisExtension impl satisfies the substrate-Term verb body
discipline; the hand-written kernel below uses the default empty
body_arena() emitted by foundation-sdk 0.4.11’s axis!
companion macro (the primitive-fast-path-equivalent realization).
Explicit substrate-Term decomposition of
CpuI8MatmulSquare<DIM>::matmul — fold_n(DIM, ...) over rows ×
fold_n(DIM, ...) over columns × fold_n(DIM, ...) over
reductions, with a sign_extend sub-verb (matching Ge(operand, Literal(0x80, W8)) to select between Concat(0x00, operand) and
Concat(0xff, operand)) plus W16 Mul + W16 Add accumulation
plus saturation via Match over Ge(acc, Literal(0x7fff, W16)) /
Lt(acc, Literal(0x8000, W16)) per ADR-054 § Substrate-Term
realization examples — is syntactically expressible in
foundation-sdk 0.4.11’s verb-body grammar. ADR-056 admits
le/lt/ge/gt and concat in verb/axis bodies (only the
route body’s syntactic surface retains the ψ-residuals rejection);
foundation-sdk 0.4.11’s depth-2 const-generic-leaf partition-product
projection covers the fold-n composition over matrix shapes. The
remaining work is operational composition: the architectural
witness verbs in crate::verbs (saturating-xor + concat-bytes)
demonstrate the per-element primitives; the unfolded
fold-over-rows-and-columns matmul body is a published-roster
follow-on.
The hand-written for-loop kernel below is the operational form;
byte-output equivalence with BLAS reference outputs at integer
precision is checked at tests/conformance.rs.
Structs§
- CpuI8
Matmul Square - Parametric square
DIM × DIMi8×i8→i16matmul. - Matrix
Shape - Parametric ConstrainedTypeShape for a row-major
ROWS × COLSmatrix ofELEM_BYTES-byte elements. Per ADR-031’sTensor<Element, Shape>shape commitment, restricted to matrix rank-2 here; higher ranks compose throughpartition_product!per ADR-033/044. - Vector
Shape - Parametric ConstrainedTypeShape for a length-
Nvector ofELEM_BYTES-byte elements. Per ADR-031’sTensor<Element, Shape>for rank-1.
Constants§
- KERNEL_
MATMUL - MAX_
TENSOR_ DIM - Maximum square dimension any
CpuI8MatmulSquareinstantiation supports. Cap at 16: output buffer =2 * 16 * 16= 512 bytes, inputs = 256 bytes each, total kernel byte budget bounded.
Traits§
- Tensor
Axis - Wiki ADR-031 tensor-compute axis.
Type Aliases§
- CpuI8
Tensor4x4 Matmul - 4×4
i8matmul — the canonical small-tensor reference. - CpuI8
Tensor8x8 Matmul - 8×8
i8matmul. - CpuI8
Tensor16x16 Matmul - 16×16
i8matmul (theMAX_TENSOR_DIMceiling).