tokitai-operator 0.1.0

Verified DL kernel compiler: formally-checked GEMM, p-adic, sheaf, contract-carrying ops. Paper-artifact grade.
Documentation
# Feature-Gated Accelerated Pilot

P187 adds the first non-CPU execution pilot behind the `accelerated-pilot` feature.

## Scope

The pilot backend is `GpuDenseI64PilotBackend`. It advertises a GPU-like hardware target and supports one narrow operation:

- dense integer domain
- dense CPU representation as the host-side tensor boundary
- single-step pointwise `add`
- `i64` execution through `execute_i64_add_with_cpu_oracle`

The implementation intentionally avoids CUDA, wgpu, HIP, streams, device allocation, and external runtime dependencies. It is a feature-gated pilot kernel boundary, not a performance claim.

## Semantic Guardrail

Every pilot execution runs a CPU oracle comparison:

1. Clone the input `TensorStore`.
2. Execute the same graph with `CpuScalarBackend`.
3. Execute the pilot dense i64 add path.
4. Compare every checked output exactly against the CPU oracle.
5. Return `GpuDenseI64PilotReport` with checked outputs, match status, and evidence.

The report keeps `preliminary_runtime_ns` optional and currently unset. Benchmark and performance evidence are deferred to later phases.

## Explicit Non-Support

The pilot rejects:

- p-adic domains
- finite-site sheaf / cover-local paths
- reductions
- matmul
- fused steps
- any non-integer domain

Unsupported paths return explicit backend errors instead of silent fallback or accidental execution.

## Validation

The default build remains dependency-light and offline-testable. The pilot is validated separately with:

```bash
cargo test --offline --features accelerated-pilot --test accelerated_pilot
```

Default CI-style validation still uses:

```bash
cargo test --offline
```