origin-lang 0.2.0

# Origin and the Total Function Problem

Most AI inference pipelines are built on partial functions pretending to be total.

A model returns a confidence score. The downstream code doesn't check it. A sensor returns out-of-range data. The planner keeps planning. A retrieval system returns a passage with no grounding. The generator cites it as fact.

These are all the same failure: a computation crossed outside the domain where its outputs have meaning, and nothing in the type system noticed.

---

## The problem

A **total function** returns a valid output for every possible input. No crashes. No undefined behavior. No silent propagation of garbage.

A **partial function** only works for some inputs. `divide(a, b)` is partial — it's undefined at `b = 0`. `model.predict(x)` is partial — it produces symbols with no grounding when `x` is outside the training distribution.

Most code treats partial functions as total. The undefined cases are handled by convention — a `None` here, a `try/except` there, a `NaN` that propagates silently until something breaks three layers later. Each mechanism is different. None of them carry the computation's last known state through the failure.

For systems that must operate reliably in open environments — medical diagnosis, autonomous vehicles, power grid control, legal compliance — this is not a style problem. It is a safety problem. The dangerous failures happen exactly at the boundary between "the model knows" and "the model is guessing."

---

## What Origin does

Origin makes every partial function total by giving the boundary a type.

```rust
fn infer(input: &str) -> Value<Diagnosis, InferBoundary>
```

This function returns a valid, typed output for every possible input:

- **`Origin`** — pure absorption. The computation produced no value at all. Nothing to carry forward.
- **`Contents(diagnosis)`** — the model is confident. The value is in safe territory.
- **`Boundary { reason: LowConfidence { confidence: 0.61, threshold: 0.85 }, last: diagnosis }`** — the model crossed the edge of its domain. The reason says which boundary. The `last` field carries what the model computed before it got there.

The function is total with respect to the extended domain that includes boundaries and the origin. There is no input for which the output is undefined. The boundary *is* the defined response. The origin *is* the defined absence.

The compiler enforces two rules:

1. **No `.unwrap()`** — you cannot bypass the boundary check.
2. **No wildcard `Boundary { .. }`** — you cannot acknowledge the boundary exists and then ignore what kind it is.

Every boundary kind must be handled individually. The program does not compile until the developer has decided what each boundary means for their domain.

---

## What this preserves

| | `try/except` | `Result<T, E>` | `Value<T, B>` |
|---|---|---|---|
| That something went wrong | yes | yes | yes |
| Which boundary was crossed | sometimes | yes | yes |
| What the computation last knew | opt-in | **impossible** | **guaranteed** |
| Full reasoning chain | no | no | **yes** |

The `last` field is the critical difference. When a model returns a low-confidence diagnosis, `Result` drops the diagnosis. Origin keeps it. The downstream system knows not just that the model was uncertain, but what it was uncertain *about*.

A reasoning trace built from `.trace()` calls shows every step of the computation up to the boundary — not where the code was (a stack trace), but what the computation knew (a reasoning trace).

---

## The formal foundation

The distinction between Origin, Contents, and Boundary is not a design choice. It is forced by the interaction axioms of two-sorted arithmetic, verified in [508 Lean 4 theorems](https://github.com/knoxvilledatabase/two-sorted-arithmetic/blob/main/PROOFS.md) with zero `sorry` escapes.

The core result: any system that handles "undefined" must distinguish between three sorts — values inside the domain (contents), values at the categorical boundary, and the origin (pure absorption, no value at all). Collapsing this distinction — treating boundary values as contents, or silently propagating them — reproduces the failures that 97 independent workarounds across mathematics, physics, logic, and computation were built to prevent.

`NULL`, `None`, `NaN`, renormalization, three-valued logic, proper classes, IEEE 754 quiet propagation — these are all patches for the same collapsed distinction. Origin is the uncollapsed version, enforced at the type level.

---

## What this does not solve

Origin enforces **acknowledgment**, not **correctness**. A developer can handle a `Hallucinated` boundary with a shrug. The compiler ensures they saw it. It does not ensure they made the right decision.

Origin does not solve:

- **The halting problem.** Open-world domains are not cleanly enumerable. A system can always encounter inputs outside every defined boundary kind.
- **End-to-end formal verification.** Origin handles the uncertainty plumbing between components. Full system verification requires dependent types, refinement types, or proof assistants applied to the entire architecture.
- **Model quality.** Origin cannot make a bad model good. It can make a bad model's uncertainty visible and impossible to ignore.

---

## Where it applies

Any system where AI inference feeds into consequential decisions across multiple domains:

- **Medical diagnosis** — model confidence below threshold triggers specialist referral, with the model's partial diagnosis preserved for context.
- **Autonomous vehicles** — sensor degradation propagates through perception, planning, and control with the last known good state at every layer.
- **Financial trading** — model disagreement across pricing engines forces human review, with all three prices visible.
- **Energy grid control** — demand forecast uncertainty propagates to dispatch with the forecast's best estimate preserved.
- **Legal compliance** — document analysis confidence below threshold triggers manual review, with extracted claims preserved.

These are not hypothetical. Each of these pipelines is [implemented and tested](https://github.com/knoxvilledatabase/origin/tree/main/demo/src) in the Origin repository — five industries, fifteen scenarios, every boundary handled.

---

## The cost

Zero. `Value<T, B>` and `Option<T>` run at the same speed on real inference workloads. Same enum representation. Same branch prediction. The boundary fields exist only when you're at the boundary. When you're in the contents, the cost is identical to not having Origin at all.

Origin matches the best-case `Result` in branch count while preserving what `Result` cannot carry — and beats real-world `Result` where domain error types differ.

You are not paying for epistemic honesty. You are getting it free.

---

## Try it

```bash
pip install origin-lang
```

```toml
[dependencies]
origin = { git = "https://github.com/knoxvilledatabase/origin" }
```

The [formal theory](https://github.com/knoxvilledatabase/two-sorted-arithmetic). The [Lean 4 proofs](https://github.com/knoxvilledatabase/two-sorted-arithmetic/blob/main/PROOFS.md). The [code](https://github.com/knoxvilledatabase/origin).

A soft philosophical problem — zero is multiple things, the boundary between knowledge and uncertainty has no name — turned into a hard compiler error.