# Benchmark Results
Baseline numbers for the two main solver paths, captured from
`cargo bench --bench solver_benchmarks -- --quick` on the dev host
after the iter-1–5 fixes on branch `deepreview/sota-pass-2026-05-18`.
The harness exercises each solver on a deterministic diagonally-
dominant N×N test matrix (5 on the diagonal, ±1 on the four nearest
off-diagonals with wrap, so the matrix is well-conditioned without
being trivially diagonal). The reported numbers are criterion's
median over its quick-mode samples; full release-mode runs go in CI's
`bench-smoke` job.
## Solver runtime (single solve — `Linear` class)
| Neumann series | 3.6 µs | 12.6 µs | 51.5 µs | ~5.0 Melem/s |
| Optimized CG | 198 ns | 316 ns | 816 ns | ~314 Melem/s |
**Observation.** Optimized CG is **40–60× faster than Neumann** across
all three sizes on these symmetric test matrices, which matches the
classical ranking (CG amortises matrix-vector products with very tight
inner loops, Neumann pays per-term scaling and a residual recompute).
Use Neumann when the system is asymmetric and CG isn't applicable;
otherwise CG is the default.
## Change-driven solve paths — `Linear` vs `SubLinear`
Captured from `cargo bench -- --quick delta_solve` (PR #31, ADR-001
roadmap items #2 + #6 phase-2). Three change-driven paths on a
single-row RHS perturbation against a strict-DD ring-stencil matrix:
| 64 | — | 11.9 µs | 906 µs |
| 256 | 66.7 µs | 45.5 µs | 2.28 ms |
| 1024 | 258 µs | 179 µs | 2.32 ms |
**Architectural property:** `cold_full` and `warm_full` grow ~linearly
with `n` (~4× cost for 4× size). `sparse_closure` is essentially
constant: 256→1024 leaves the cost unchanged (2.28 ms → 2.32 ms).
The curves diverge with `n` — the SubLinear path's per-event cost is
bounded by closure size, not by the global state-space size. Crossover
on this conservative bench (`closure_depth=4, max_terms=32`) is past
`n ≈ 10⁴`; tuning those constants to a tighter spectral-radius bound
pulls the crossover much lower.
### Event-driven anomaly example
`cargo run --release --example event_driven_anomaly` (PRs #33, #35)
runs the full ADR-001 inner-loop discipline:
```
baseline: 88 µs full Neumann solve, one-shot (Linear)
```
The "no event, no work" coherence gate fires in ≤1 µs when the
delta's induced change in `x` is below the caller's tolerance, so
noisy / drifting / sub-resolution events cost nothing. Meaningful
events flow through the full SubLinear pipeline; nothing outside
their bounded-depth closure is ever touched.
### Complexity-class catalog
Every public primitive declares its worst-case class via the
`Complexity` trait (`src/complexity.rs`) and the matching
`x-complexity` annotation in MCP. The phase-2 SubLinear additions
shipped in this release cycle:
| `closure_indices` | **SubLinear** | bounded-depth BFS — input to every change-driven path |
| `solve_single_entry_neumann` | **SubLinear** | computes `x[i]` without materialising `x` |
| `solve_on_change_sublinear` | **SubLinear** | returns `Vec<(idx, val)>` over the closure only |
| `contrastive_solve_on_change_sublinear` | **SubLinear** | end-to-end change-driven top-k |
| `delta_below_solve_threshold` | `O(\|δ\|)` | coherence-gated skip — never touches `n` or `nnz` |
The MCP `solve`, `estimateEntry`, and `solveTrueSublinear` handlers
enforce the caller's `max_complexity_class` budget before any solver
work runs (ADR-001 item #4 phase-2). `estimateComplexityClass` now
surfaces every primitive in the catalog for budget-aware clients.
## Fixes that landed this cycle
(numbers above reflect post-fix performance; pre-fix numbers in parens)
- **Neumann residual against original RHS**
(`src/solver/neumann.rs:update_residual`). Previously compared
`A·x` against the *scaled* RHS `D⁻¹b`, which is the residual of a
different equation. Result: the solver kept iterating past the true
convergence point and outright diverged at n ≥ 64 on the bench
matrices. After the fix, n=16 is 47% faster (correct early exit),
n=64 and n=256 converge cleanly where they couldn't before.
- **k=0 Neumann term double-counted** (iter-2). The solution was
initialised to `D⁻¹b` *and* `compute_next_term` immediately added
another copy, so a 2×2 system that should converge to `[1,1]`
ended at `[2,2]`.
- **Conjugate-gradient instrumentation** (iter-2).
`OptimizedConjugateGradientSolver` was inlining every dot product
and AXPY directly, so `performance_stats.dot_product_count` /
`axpy_count` stayed at 0 the entire run. Routed the three CG dot
products and two AXPY updates through the existing instrumented
helpers; SIMD/scalar dispatch unchanged.
- **Adaptive sampler symmetry** (iter-2). `adapt_parameters` scaled
UP by 1.5× on high error but DOWN by only 0.8× on low error, so
the sampling probability drifted monotonically upward over time.
Made the factors inverse so a high-then-low pair round-trips.
- **JL embedding `target_dim`** (iter-1). `compute_target_dimension`
could return `k > n` (a dimensional *expansion* dressed up as a
reduction). Capped at `n−1` with the standard Achlioptas constant.
- **Sublinear-Neumann base case** (iter-3). `solve_base_case` ran a
hard-coded 10 Jacobi iterations, well short of the ~30 needed to
hit the test's `<1e-10` residual on a 2×2 system. Now driven by
`max_recursion_depth` with a 64-iter floor and step-size early
termination at 1e-14.
## How to run
```bash
# Quick (single sample per benchmark, useful in CI)
cargo bench --bench solver_benchmarks -- --quick
# Full criterion run (warmup + ~30 samples per benchmark)
cargo bench --bench solver_benchmarks
# CI bench-smoke job runs the --quick variant on every PR
```
Criterion writes HTML reports to `target/criterion/` if you want to
poke at the per-sample histograms.