# The Tensor Engine: Autonomic Cognitive Routing and Token Thermodynamics
**Version:** 2.0 (SOTA 2026 Standard)
**Target:** `coreason-runtime` Tensor Matrix (`src/coreason_runtime/tensor/`)
## Abstract: The Computational Economics of Inference
In distributed multi-agent systems, the cognitive complexity of a given task ($K_{req}$) is highly variable. Executing a standard Regex-equivalent data extraction on a 1-Trillion parameter model results in severe computational and financial waste. Conversely, executing complex ontological deductions on an 8-Billion parameter local model results in mathematical failure and state corruption.
The `coreason-runtime` solves this optimization problem via the `TensorRouter`. The router acts as an asynchronous multiplexer that dynamically shifts workloads across a heterogeneous compute matrix, minimizing the cost function ($C_{exec}$) while guaranteeing that the deployed model's capability ($K_{model}$) satisfies $K_{model} \ge K_{req}$.
---
## 1. The Discrete Cognitive Spectrum (Hardware Tiers)
The `TensorRouter` stratifies network sockets into discrete tiers based on physical hardware access, inference latency, and monetary cost.
### 1.1 Tier 0: Kinetic Bare-Metal (FSM Constrained Decoding)
* **Target:** Self-hosted open-weights models (e.g., Llama-3-8B) via `SGLangKineticClient`.
* **Physics:** Zero marginal monetary cost per token. Sub-50ms inference latency utilizing RadixAttention (prefix caching).
* **Absolute Determinism:** Tier 0 guarantees structural output via **Finite State Machine (FSM) Logit Masking**.
* The target Pydantic schema is compiled into a deterministic regular expression.
* During the LLM's forward pass, the inference engine calculates the probability distribution (logits) for the next token.
* The FSM operates directly at the CUDA/Triton level: if a proposed token $x$ violates the regular expression syntax, the engine forces its probability $P(x) = 0$ prior to sampling.
* **Result:** It is mathematically impossible for the local model to yield an unparsable schema.
### 1.2 Tier 2: Frontier Cloud Oracles
* **Target:** Proprietary, high-parameter endpoints (e.g., DeepSeek-R1, Google Gemini 1.5 Pro) via `CloudOracleClient`.
* **Physics:** High network latency (HTTP overhead), high monetary cost per token ($C \gg 0$), but theoretically unbounded chain-of-thought depth.
* **Soft Determinism:** Because the runtime cannot access the physical GPU logits of a Cloud API, it relies on "Structured Outputs" (JSON Schema), which are statistically highly accurate but susceptible to latent context drift or network truncation.
---
## 2. Schema Homogenization (The Universal Compiler)
To maintain interface parity across Tier 0 local hardware and Tier 2 remote APIs, the Tensor layer utilizes the `UniversalCompiler` to intercept all requests and enforce structural determinism.
### 2.1 Pre-Flight Translation
The compiler dynamically translates standard Python Abstract Syntax Trees (Pydantic objects) into the specific structural dialect required by the targeted endpoint (e.g., raw Regex strings for `SGLang`, strict JSON Schema for OpenAI/DeepSeek).
### 2.2 Active Error Injection (The Self-Correction Loop)
When a Tier 2 Cloud API returns an invalid JSON string, legacy systems crash the parser. The `UniversalCompiler` utilizes a mathematically bounded feedback control loop (`tenacity`):
1. The compiler executes a strict `model_validate` on the HTTP response.
2. Upon raising a `ValidationError`, it appends the exact Python traceback to the current prompt context: $Prompt_{k+1} = Prompt_k + Traceback_k$.
3. It re-invokes the remote LLM, forcing the model to evaluate its own parsing failure.
4. **Boundary Limit:** This control loop is strictly bounded to $k_{max} = 3$ iterations. If the model fails on the final attempt, the compiler raises a fatal error rather than entering an infinite loop.
---
## 3. Fault Tolerance: The Autonomic Escalation Cascade
Relying on a single node or a single cloud provider in a distributed system guarantees workflow failure. The `TensorRouter` implements a cascading fallback matrix to protect the Temporal orchestrator from execution traps.
**The State Transition Sequence:**
1. **Initial Dispatch:** The orchestrator dispatches the execution intent to the Tier 0 (Bare-Metal) client to minimize thermodynamic cost.
2. **Kinetic Yield:** If the local 8B model cannot resolve the constraints (e.g., it times out, or the FSM physically traps on an impossible parsing constraint), the router intercepts the local exception.
3. **The Escalation:** The router autonomously repackages the HTTP request and transmits it to the Tier 2 Cloud Oracle, trading marginal monetary cost for a massive increase in reasoning depth.
4. **The Epistemic Yield:** Only if the 1-Trillion parameter Tier 2 model *also* exhausts its $k_{max} = 3$ retry loop does the router raise an `EpistemicYieldError`.
5. **Orchestrator Suspension:** This exception is caught by the Temporal worker, safely pausing the thread and signaling the human operator (The Oracle Circuit) for manual intervention.
---
## 4. Compute Budget Caging (Token Economics)
Because the Temporal orchestrator executes autonomous retry loops, a faulty agent logic structure could query a Tier 2 API thousands of times, resulting in a denial-of-wallet attack.
To prevent infinite-loop bankruptcy, the `TensorRouter` maintains an atomic state counter for every active `workflow_id`.
* **The Constraint:** The system architect defines $\Omega_{workflow}$ (the absolute maximum token ceiling for a given workflow).
* **Cost Calculation:** The router tracks $T_{in}$ (Prompt Tokens) and $T_{out}$ (Completion Tokens) per network request. Because autoregressive generation requires a full forward pass per token, completion tokens are weighted thermodynamically heavier:
$$C_i = T_{in} + (T_{out} \times 3)$$
* **The Cage:** Before returning the generated result to the Temporal worker, the router calculates the new cumulative total. If the execution breaches the ceiling ($\sum C_i > \Omega_{workflow}$), the router instantly severs the TLS socket and raises a `BudgetExceededError`. This bypasses standard retry logic and forces an immediate, safe suspension of the workflow.