origin-lang 0.2.0

# Origin

Origin makes your compiler catch a problem that every AI system has right now.

When your model isn't sure about something — low confidence, out of its training domain, potentially hallucinating — that uncertainty usually lives in a float field that someone might forget to check. It reaches production silently. You find out three weeks later from user complaints.

Origin changes that. When a function returns an uncertain value, you can't use it without handling the uncertainty first. You can't forget. You can't `.unwrap()` your way past it.

But here's what makes it different from just using `Result` or `try/except`: when something goes wrong, Origin doesn't throw away what the model computed. It keeps it. So instead of "the model failed, here's an error code," you get "the model leaned toward pneumonia but wasn't confident enough to say so — here's what it knew when it stopped."

That's the difference between a system that says "I don't know" and a system that says "here's my full reasoning up to the point where I ran out of certainty."

---

## Try It

**Python** — runtime enforcement:

```bash
pip install origin-lang
```

```python
from origin_lang import Value, boundary

@boundary
class LowConfidence:
    confidence: float
    threshold: float

# Three sorts. The compiler knows which.
match infer(text):
    case Value.Contents(diagnosis):
        treat(diagnosis)
    case Value.Boundary(LowConfidence(confidence=c), last=diagnosis):
        refer_to_specialist(diagnosis, c)
        #                   ^^^^^^^^^
        #                   still here
    case Value.Origin(reason):
        escalate(reason)
```

**Rust** — compile-time enforcement:

```toml
[dependencies]
origin-lang = "0.1"
```

```rust
#[boundary_check]
fn diagnose(input: &str) -> String {
    match infer(input) {
        Value::Contents(diagnosis) => use_it(diagnosis),
        Value::Boundary { reason: LowConfidence { .. }, last } => refer(last),
        //                                               ^^^^
        //                                               always here
        Value::Origin(reason) => escalate(reason),
    }
}
```

Python gets you the concept today. Rust gets you the compiler guarantee.

---

## Three Sorts

Every value is one of three things:

- **Origin** — the system hit its absolute boundary. No last value. The reason is all that exists.
- **Boundary** — the value crossed an edge. Carries the reason AND the last known value.
- **Contents** — the value is in safe territory. Arithmetic lives here.

```rust
enum Value<T, B: BoundaryKind> {
    Origin(B),
    Boundary { reason: B, last: T },
    Contents(T),
}
```

This structure is formally verified. The [two-sorted arithmetic](https://github.com/knoxvilledatabase/two-sorted-arithmetic) repo proves in Lean 4 — 508 theorems, zero errors — that these three sorts are necessary and sufficient. Every `≠ 0` hypothesis in mathematics traces back to confusing origin with contents. In Origin, the type system makes that confusion a compile error.

---

## How It Works

```
✓ [1] pharmacokinetics: Distribution { concentration: 7.14, half_life_hours: 4.0 }
✓ [2] dosage calculation: Dosage { amount_mg: 571.4, frequency_hours: 4.0 }
✓ [3] AI diagnosis: Diagnosis { condition: "ACS", confidence: 0.92 }
✗ [4] contraindication check: SafeDrug { name: "Aspirin", ... }
```

Three layers completed. The fourth hit a boundary. This is not a stack trace. A stack trace tells you where the code was. This is a reasoning trace. It tells you what the computation knew.

| What survives a boundary | Python | Rust / `Result<T,E>` | Origin / `Value<T,B>` |
|---|---|---|---|
| That something went wrong | yes | yes | yes |
| Which boundary was crossed | sometimes | yes | yes |
| What the computation last knew | **opt-in** | **impossible** | **guaranteed** |
| Absolute boundary vs edge | **no** | **no** | **yes** |
| Full reasoning chain | no | no | **yes** |

Python lets a good engineer add `last_known` to an exception class. The language has no opinion — both versions compile, both ship. `Result<T,E>` is `Ok(T) | Err(E)` — when you return `Err`, the `T` is dropped. That's not a style choice. `Value<T,B>` carries the `T` in the `Boundary` variant. The compiler guarantees it. And `Origin` carries the reason when there is genuinely nothing to preserve.

This is not a performance claim. It is a type system fact. Verified across [five industries, fifteen scenarios](comparison/).

---

## The Problem

Every model you've ever shipped had a confidence score that lived in a float field someone forgot to check.

Every language you've ever used had its own mechanism for this:

- C gave you `NULL`
- Java gave you `NullPointerException`
- JavaScript gave you `null`, `undefined`, *and* `NaN`
- IEEE 754 gave you quiet NaN propagation
- Rust gave you `Option<T>`
- Python gave you `None`
- SQL gave you three-valued logic
- Physics gave you renormalization
- Mathematics gave you "undefined"

Different mechanisms. Different languages. Different centuries. Same problem: the value crossed a boundary, and nobody told you.

---

## The Insight

These are all the same thing.

A value is either in safe territory — the **contents** — or it has crossed the edge of its domain — a **boundary** where the last known value is preserved — or it has hit the absolute limit of the system — the **origin**, where there is nothing left to preserve.

Not presence vs absence. Not `Some` vs `None`. Three sorts. Where are you in the domain?

Division by zero isn't a missing value. The denominator exists — it's at the boundary. A model's low-confidence output isn't absent — it's at the boundary between knowledge and uncertainty. And when the model fails entirely — no output, no trace — that's origin. The boundary of the system itself.

Three sorts. Every domain.

---

## Enforcement

Origin gives you the mechanism. You define what "boundary" means for your domain.

```rust
#[derive(Debug, Clone, BoundaryKind)]
pub enum InferBoundary {
    LowConfidence { confidence: f64, threshold: f64 },
    Hallucinated  { claim: String, evidence_score: f64 },
    OutOfDomain   { distance: f64 },
}
```

This is yours. A trading desk defines `ExposureLimitExceeded`. A power grid defines `FrequencyDeviation`. An AV stack defines `SensorDegraded`. Origin doesn't decide where the boundary is — your domain does. Origin makes sure the compiler enforces it.

The program that won't compile:

```rust
#[boundary_check]
fn diagnose(input: &str) -> Diagnosis {
    infer(input).unwrap()
}
```

```
error: boundary not handled: `.unwrap()` bypasses boundary checking

  A Value<T> may be at a boundary. Handle it explicitly:

    match value {
        Value::Contents(v) => /* safe to use */,
        Value::Boundary { reason, last } => /* crossed edge, has context */,
        Value::Origin(reason) => /* absolute boundary, no value */,
    }

  or use `.or(fallback)` to provide a default value
```

And if you try to dodge it with a wildcard:

```rust
#[boundary_check]
fn diagnose(input: &str) -> String {
    match infer(input) {
        Value::Contents(d) => use_it(d),
        Value::Boundary { .. } => "uncertain".to_string(), // COMPILE ERROR
        Value::Origin(_) => "failed".to_string(),
    }
}
```

```
error: boundary kind ignored: wildcard `Boundary { .. }` catches all boundary kinds

  Each boundary kind exists for a reason. Handle them individually.

  A wildcard boundary arm is the new `.unwrap()` — it acknowledges
  the boundary exists and then ignores what it means.
```

---

## Performance

The cost of carrying the reason and the residual is zero.

`Value<T, B>` and `Option<T>` run at the same speed on real inference (DistilBERT-SST-2, Apple Silicon, 512 inferences). Same enum representation. Same branch prediction. Three variants instead of two — one extra discriminant byte, invisible against actual computation.

You are not paying for the information. You are getting it free.

For `?` propagation, `propagate()` converts to `Result<T, B>` — but drops `last`. If you need the full context through `?`, use `propagate_with_last()` which returns `Result<T, (B, Option<T>)>`. And `or_else` lets you handle Origin and Boundary distinctly without a full `match`.

---

## Where This Came From

Zero has always been three things: the number zero (contents), the empty container (boundary), and the origin of the domain (origin). That conflation — one symbol, three roles — is the same conflation in every system that handles undefined.

The [two-sorted arithmetic](https://github.com/knoxvilledatabase/two-sorted-arithmetic) repo is the proof. 508 Lean 4 theorems show that when you separate the three sorts, 46 hypothesis instances across 10 benchmarks dissolve to zero. Every seed proof is `rfl` — true by construction. The field axiom itself — "every nonzero element has an inverse" — drops the `≠ 0` qualifier when the type carries the sort.

This repo is the enforcement. The Lean theorems prove you cannot have a total extension of bounded arithmetic without the three-sort split. The Rust compiler proves you cannot ship code that ignores the split without a compile error. The math forces the design. The design forces the code.

---

## Repository Map

| Directory | What it does |
|---|---|
| `src/` | The core: `Value<T, B>` (Origin, Boundary, Contents), `Chain`, `propagate()`, `trace()` |
| `macros/` | Proc-macros: `#[derive(BoundaryKind)]` and `#[boundary_check]` |
| `python/` | Pure Python package (`pip install origin-lang`), same three sorts, runtime enforcement |
| `demo/` | Five industry cascades: healthcare, finance, energy, vehicles, legal |
| `comparison/` | Python vs Result vs Origin, structural argument table, branch count comparison |
| `transpiler/` | `originc`: Origin syntax to Rust source (experimental, early stage) |
| `bench/` | Benchmark: DistilBERT-SST-2 via candle, Value vs Option — identical performance |
| `spec/` | Type system specification derived from the Lean 4 proofs |
| `tests/` | Cross-crate proof: boundary kind in crate A, value in B, enforcement in C |
| `RESEARCH.md` | For AI safety researchers: Origin and the total function problem |

---

*The ocean absorbed the fish. The fish didn't need to contain the ocean. It needed the ocean to have a name.*