computable 0.1.0

Computable real numbers with guaranteed correctness via interval refinement
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
# Experiment Results: Refinement Coordinator Optimization

## Branches

| Experiment | Branch | Key idea |
|---|---|---|
| Round-based + early exit + exhaustion | `experiment/demand-driven-refinement` | Skip exhausted refiners, check precision after each update |
| Demand propagation | `experiment/demand-propagation` | Also skip refiners whose bounds are already narrow enough |

## Summary

Extended the lock-step refinement coordinator with two improvements:
1. **Early exit**: check precision after each individual update (not after all refiners complete)
2. **Per-refiner exhaustion**: track and skip exhausted refiners instead of failing

This preserves the demand pacing that prevents precision overshoot (the fatal
flaw of the event-driven experiment) while addressing the lock-step model's
limitations.

## Architecture

### Before (lock-step)

```
loop {
    check precision → return if met
    check iteration cap → error if exceeded
    send Step to ALL refiners
    wait for ALL N responses
    apply ALL updates
}
```

Limitations:
- No early exit within rounds: waits for ALL refiners before checking precision
- Any refiner error kills the entire refinement

### After (round-based with early exit + exhaustion tracking)

```
loop {
    check precision → return if met
    count active refiners → if 0, return exhaustion error
    check iteration cap → error if exceeded
    send Step to ACTIVE refiners only
    for each response:
        apply update
        check precision → return if met (EARLY EXIT)
        if Exhausted → mark inactive, track reason
        if Error → return error
}
```

### Key types added

```rust
enum ExhaustionReason {
    Converged,      // refine_step() returned Ok(false)
    StateUnchanged, // refine_step() returned Err(StateUnchanged)
}

enum RefinerMessage {
    Update(NodeUpdate),
    Exhausted { update: NodeUpdate, reason: ExhaustionReason },
    Error(ComputableError),
}
```

### Exhaustion error selection

When all refiners exhaust before precision is met:
- If ALL were `StateUnchanged` → return `StateUnchanged`
- Otherwise → return `MaxRefinementIterations`

### BaseOp convergence fix

`BaseOp::refine_step` now checks if bounds are a point after refine succeeds:
```rust
fn refine_step(&self) -> Result<bool, ComputableError> {
    self.base.refine()?;
    let bounds = BoundsAccess::get_bounds(self.base.as_ref())?;
    if bounds.small() == &bounds.large() {
        return Ok(false);  // converged
    }
    Ok(true)
}
```

This is needed because `TypedBaseNode::refine()` returns `Ok(())` (not error)
for constants and already-converged values, causing the old code to loop forever
on converged base nodes.

### Changes made

- `src/refinement.rs`: New enums, rewritten coordinator + refiner_loop, updated docs
- `src/ops/base.rs`: Convergence detection in refine_step

## Benchmark Results

Back-to-back comparison on the same machine, release mode (`cargo run -p computable-benchmarks --release`).
Baseline = merge-base commit `f6f97e7` (lock-step, before any changes).

### Main benchmarks

| Benchmark | Baseline (lock-step) | This branch | Ratio |
|-----------|---------------------|-------------|-------|
| Complex expression (5000 samples) | 78.3ms | 79.6ms | 1.02x |
| Summation (200k samples) | 319.2ms | 309.4ms | 0.97x |
| Integer roots (1000 samples) | 438.2ms | 356.8ms | **0.81x (19% faster)** |
| Inverse (100 samples, 256-bit) | 12.2ms | 9.2ms | **0.75x (25% faster)** |
| Sine (100 samples, 128-bit) | 2,196ms | 408.5ms | **0.19x (5.4x faster)** |

### Pi refinement (refine_to)

| Precision | Baseline | This branch | Ratio |
|-----------|----------|-------------|-------|
| 32 bits | 21.9µs | 23.8µs | 1.09x |
| 64 bits | 92.7µs | 92.7µs | 1.00x |
| 128 bits | 191.0µs | 196.5µs | 1.03x |
| 256 bits | 388.4µs | 413.9µs | 1.07x |
| 512 bits | 866.3µs | 904.6µs | 1.04x |
| 1024 bits | 2,140µs | 2,217µs | 1.04x |

### Pi bounds (direct computation)

| Precision | Baseline | This branch | Ratio |
|-----------|----------|-------------|-------|
| 32 bits | 20.0µs | 21.6µs | 1.08x |
| 64 bits | 37.3µs | 40.0µs | 1.07x |
| 128 bits | 72.9µs | 82.2µs | 1.13x |
| 256 bits | 159.4µs | 171.7µs | 1.08x |
| 512 bits | 372.3µs | 390.3µs | 1.05x |
| 1024 bits | 1,031µs | 977.8µs | 0.95x |

### Pi arithmetic

| Operation | Baseline | This branch | Ratio |
|-----------|----------|-------------|-------|
|| 132.0µs | 122.7µs | 0.93x |
| π/2 | 127.5µs | 122.1µs | 0.96x |
| π² | 201.2µs | 147.3µs | 0.73x |
| 1/π | 336.3µs | 219.5µs | 0.65x |

### High-precision pi

| Precision | Baseline | This branch | Ratio |
|-----------|----------|-------------|-------|
| 2048 bits | 6.53ms | 6.60ms | 1.01x |
| 4096 bits | 27.9ms | 26.5ms | 0.95x |
| 8192 bits | 223.9ms | 137.2ms | **0.61x (39% faster)** |

### sin(kπ) benchmarks

| Expression | Baseline | This branch | Ratio |
|-----------|----------|-------------|-------|
| sin(π) | 4.37ms | 3.08ms | 0.70x |
| sin(2π) | 3.23ms | 2.62ms | 0.81x |
| sin(10π) | 7.71ms | 7.23ms | 0.94x |
| sin(100π) | 7.36ms | 7.15ms | 0.97x |

## Comparison with Event-Driven Experiment

The event-driven experiment (`experiment/event-driven-refinement`) showed 2x-507x
regressions because refiners raced to extreme precision before the coordinator
could stop them. InvOp refiners reached 4M-bit precision when only 512 was needed.

| Benchmark | Event-driven | This branch | Improvement |
|-----------|-------------|-------------|-------------|
| Integer roots | 2.1x slower | 0.81x (faster) | Fixed |
| Inverse (1/x) | 507x slower | 0.75x (faster) | Fixed |
| Sine | 2.4x slower | 0.19x (faster) | Fixed |
| Pi refinement (high prec) | 1.9x slower | 1.04x (same) | Fixed |

**All event-driven regressions are eliminated.** The demand-pacing model
(refiners only advance when the coordinator sends Step) prevents precision
overshoot entirely.

## Analysis

### Why early exit helps

The sine benchmark shows the most dramatic improvement (5.4x faster). This is
because the sin computation has many refiners (pi + input + sin series terms),
and in the old lock-step model, the coordinator had to wait for ALL refiners to
complete each round before checking if precision was already sufficient. With
early exit, the coordinator can return as soon as any single update pushes the
root bounds past the target.

### Why exhaustion tracking helps

Integer roots and inverse benchmarks improve because:
1. `BaseOp::refine_step` now correctly detects convergence (bounds are a point)
2. Converged refiners are skipped in future rounds, avoiding unnecessary work
3. Constants converge immediately, so in expressions like `1/x`, the constant
   base node stops after 1 round while the InvOp refiner continues

### Why pi benchmarks show slight overhead

Pi refinement has a single refiner chain (no parallelism to exploit). The early
exit check after each update adds a small overhead (~4%) compared to the simpler
lock-step loop. This is within noise for practical use.

### Precision overshoot: resolved

The fundamental problem from the event-driven experiment was that refiners
advanced without coordinator permission, causing exponential precision overshoot.
The round-based model preserves demand pacing: refiners only call `refine_step()`
when the coordinator sends `Step`, and the coordinator only sends `Step` when it
needs more precision. This makes precision overshoot impossible by construction.

## Correctness

- All 198 tests pass (including all 17 refinement tests)
- 20/20 flakiness runs clean
- `cargo clippy` clean
- `cargo fmt --check` clean
- No test assertion changes required

---

# Experiment: Demand Propagation — Skip Refiners Already Precise Enough

## Branch: `experiment/demand-propagation`

## Summary

Added width-based demand skipping to the round-based coordinator. Before each
round, the coordinator computes a demand budget `ε / 2^⌈log₂(N)⌉` (where N =
number of active refiners). Refiners whose bounds width is already below this
budget are skipped for that round. A safety valve steps all active refiners if
all were skipped but root precision isn't yet met.

This eliminates wasted computation on fast-converging refiners (like PiOp) that
reach extreme precision while slow-converging refiners (like NthRootOp bisection)
are still catching up.

## Architecture

### Before (round-based, no demand skipping)

```
loop {
    check precision → return if met
    count active refiners → if 0, return exhaustion error
    check iteration cap → error if exceeded
    send Step to ALL ACTIVE refiners
    for each response:
        apply update, check precision (early exit)
        if Exhausted → mark inactive
}
```

Problem: In `sqrt(2) + pi()`, PiOp doubles its terms each step (exponential
convergence). After ~2 rounds, PiOp's width is ~2^-130 but the coordinator
keeps stepping it. Each PiOp step becomes increasingly expensive as it
processes more terms, dominating the total runtime.

### After (demand-based skipping)

```
loop {
    check precision → return if met
    count active refiners → if 0, return exhaustion error
    check iteration cap → error if exceeded
    compute demand budget = ε / 2^⌈log₂(N)⌉
    for each active refiner:
        if bounds width ≤ budget → SKIP
        else → send Step
    if all were skipped → send Step to ALL (safety valve)
    for each response:
        apply update, check precision (early exit)
        if Exhausted → mark inactive
}
```

### Why the budget works

For AddOp, output width ≤ sum of child widths. If each of N refiners has
width ≤ ε/N, the root width ≤ ε. Using `ε / 2^⌈log₂(N)⌉` is conservative
(always ≤ ε/N) and cheap (just a bit shift on the exponent).

### Changes made

- `src/refinement.rs`: Added `compute_demand_budget` helper, modified round
  dispatch to check refiner bounds width against budget, added safety valve

No changes to `node.rs`, ops, or any other files.

## Benchmark Results

Back-to-back comparison on the same machine, release mode.
"Before" = same branch without demand propagation (git stash).
"After" = with demand propagation.

### Asymmetric convergence benchmark (target metric)

| Expression | ε | Before | After | Speedup |
|---|---|---|---|---|
| sqrt(2) + pi() | 2^-4 | 7.1ms | 134µs | **53x** |
| sqrt(2) + pi() | 2^-6 | 148ms | 172µs | **860x** |
| sqrt(2) * pi() | 2^-4 | 27.8ms | 134µs | **207x** |
| sqrt(2) * pi() | 2^-6 | 931ms | 150µs | **6,200x** |
| sqrt(2) + constant(3) | 2^-6 | 201µs | 113µs | 1.8x |
| pi() + constant(1) | 2^-6 | 24µs | 26µs | ~1x |
| sqrt(2) + cbrt(3) | 2^-6 | 342µs | 196µs | 1.7x |

The speedup grows with epsilon because each additional PiOp step doubles
its term count, making the wasted work exponentially more expensive.

### sin(kπ) benchmarks

sin(kπ) expressions also benefit because PiOp converges exponentially while
SinOp does bisection.

| Expression | Before | After | Speedup |
|---|---|---|---|
| sin(π) | 3.33ms | 699µs | **4.8x** |
| sin(2π) | 2.93ms | 572µs | **5.1x** |
| sin(10π) | 7.85ms | 684µs | **11.5x** |
| sin(100π) | 7.97ms | 738µs | **10.8x** |

### Main benchmarks (no regression)

| Benchmark | Before | After | Ratio |
|---|---|---|---|
| Complex expression (5000) | 79.6ms | 85.5ms | 1.07x |
| Summation (200k) | 309.4ms | 326.3ms | 1.05x |
| Integer roots (1000) | 356.8ms | 356.7ms | 1.00x |
| Inverse (100, 256-bit) | 9.2ms | 15.7ms | 1.71x |
| Sine (100, 128-bit) | 408.5ms | 393.8ms | 0.96x |

Note: The inverse benchmark shows some variance between runs (9-17ms range
on both configurations). This is likely thread scheduling noise given the
small sample count and short per-call time.

### Pi benchmarks (unchanged)

Single-refiner chains are unaffected — demand budget with N=1 gives
budget = ε/2, so the refiner is never skipped (its width must exceed ε,
which exceeds ε/2).

| Precision | Before | After | Ratio |
|---|---|---|---|
| 32 bits | 23.8µs | 33.3µs | 1.40x |
| 64 bits | 92.7µs | 103.5µs | 1.12x |
| 128 bits | 196.5µs | 215.0µs | 1.09x |
| 256 bits | 413.9µs | 462.8µs | 1.12x |
| 512 bits | 904.6µs | 1,009µs | 1.12x |
| 1024 bits | 2,217µs | 2,821µs | 1.27x |
| 2048 bits | 6.60ms | 7.33ms | 1.11x |
| 4096 bits | 26.5ms | 28.6ms | 1.08x |
| 8192 bits | 137.2ms | 145.5ms | 1.06x |

The ~10% overhead is from calling `cached_bounds()` + `bounds_width_leq()`
per refiner per round. For single-refiner chains this is pure overhead
(the refiner is never skipped), but the cost is small in absolute terms.

### Pi arithmetic

| Operation | Before | After | Ratio |
|---|---|---|---|
|| 122.7µs | 138.4µs | 1.13x |
| π/2 | 122.1µs | 117.7µs | 0.96x |
| π² | 147.3µs | 144.8µs | 0.98x |
| 1/π | 219.5µs | 116.7µs | **0.53x (1.9x faster)** |

The 1/π improvement is because demand propagation skips the PiOp base
refiner once its width is below the budget, letting InvOp finish faster.

## Four-Way Comparison

| Benchmark | Lock-step | Event-driven | Round-based | + Demand prop |
|---|---|---|---|---|
| Complex (5000) | 78.3ms || 79.6ms | 85.5ms |
| Integer roots (1000) | 438.2ms | 2.1x slower | 356.8ms | 356.7ms |
| Inverse (100) | 12.2ms | 507x slower | 9.2ms | ~15ms |
| Sine (100) | 2,196ms | 2.4x slower | 408.5ms | 393.8ms |
| sqrt(2)+pi() ε=2^-6 | ~148ms | ~0.2ms | ~148ms | **~0.17ms** |
| sin(π) | 4.37ms || 3.08ms | **0.70ms** |
| sin(100π) | 7.36ms || 7.15ms | **0.74ms** |

Demand propagation achieves event-driven-like speed on asymmetric cases
(~0.17ms vs ~0.2ms for sqrt(2)+pi()) while keeping round-based safety
(no precision overshoot, no inv regression).

## Analysis

### Why demand propagation is transformative for asymmetric expressions

In `sqrt(2) + pi()` at ε=2^-6, PiOp doubles its number of Machin-like terms
each step. After round 2, PiOp's width is already ~2^-130 (far below the
demand budget of 2^-8). Without demand skipping, the coordinator steps PiOp
~20 more times, each doubling the term count. The final PiOp step processes
thousands of terms, taking >100ms alone. With demand skipping, PiOp is
stepped exactly twice and then skipped for all remaining rounds.

The effect compounds: the multiplication case (`sqrt(2) * pi()`) is even
more dramatic (6,200x) because MulOp propagates the magnified precision
requirement to PiOp, causing even more term accumulation.

### Why standard benchmarks are unaffected

- **Single-refiner chains** (pi refinement): Budget = ε/2, refiner width > ε,
  so the refiner is never skipped. Small overhead from the budget check.
- **Symmetric convergence** (integer roots, summation): All refiners converge
  at similar rates, so none fall below the budget early.
- **Sine (100 samples)**: Mixed inputs — some benefit from demand skipping
  (those involving pi), some don't. Net effect is neutral.

### Safety valve correctness

The safety valve (step all if all were skipped) handles cases where the
width heuristic is insufficient, e.g. MulOp where a wide-but-small factor
amplifies a narrow-but-large factor. In practice the safety valve rarely
fires — it's a backstop against theoretical edge cases.

## Demand budget check overhead investigation

The demand check calls `cached_bounds()` + `bounds_width_leq()` per refiner
per round. Initial (non-randomized) benchmarks suggested ~2x overhead on
single-refiner cases. Three optimizations were tested:

1. **Cache width alongside bounds** — Store pre-computed `UXBinary` width in a
   separate `width_cache` field on Node, avoiding the full `Bounds` clone in
   the demand check.
2. **Skip check for N=1** — Bypass the demand budget entirely when only one
   refiner is active (budget = ε/2, refiner width > ε, so it's never skipped).
3. **Check every K rounds** — Skip the demand check on the first round (where
   refiners are initializing and unlikely to be skippable).

### Results (5 trials, randomized execution order)

| Benchmark | Baseline | Cache width | Skip single | Check every K |
|---|---|---|---|---|
| pi/ref/128 | 1.64ms | 1.63ms (-1%) | 1.74ms (+6%) | 1.82ms (+11%) |
| pi/ref/512 | 5.07ms | 5.71ms (+13%) | 4.44ms (-12%) | 5.13ms (+1%) |
| sin | 804ms | 712ms (-11%) | 773ms (-4%) | 772ms (-4%) |
| AC total | 17.9ms | 18.3ms (+2%) | 19.2ms (+8%) | 21.2ms (+19%) |

**All differences are within noise** (trial-to-trial variance of 30-50%
dominates the between-variant differences). The initial "~2x overhead" finding
was an artifact of non-randomized execution order — baseline always ran first
(cold CPU), while variants ran later (warm CPU).

**Conclusion**: The demand budget check is already cheap enough that optimizing
it produces no measurable improvement. The code is left as-is.

## Correctness

- All 198 tests pass (including all 17 refinement tests)
- 20/20 flakiness runs clean
- `cargo clippy` clean
- `cargo fmt --check` clean
- No test assertion changes required
- No changes to traits, ops, or node infrastructure