sb-core — the SharpeBench scoring kernel
A pure, deterministic library that turns a set of agent trajectories (per-seed × per-window return series + decision traces) into a luck-robust, risk-adjusted score and leaderboard ranking.
Design invariants — these are what make a SharpeBench score reproducible forever:
- Pure. No I/O, no system clock, no ambient randomness. Any randomness (the significance bootstrap) takes an explicit seed argument.
- Deterministic. Plain
f64math, fixed reduction order, no parallel float sums. The same input yields byte-identical output on any platform. - No
unsafe.
The headline idea: an agent does not rank on raw return. It ranks only if
its edge survives (a) deflation for the number of agents tested
([deflated_sharpe]), (b) reliability across every seed×window
([pass_k]), and (c) decision-process discipline ([process]). Raw return is
reported but never the rank key — see [composite].