Expand description
ferrum-bench-core — canonical schema, metric aggregation, and
variance reporting for ferrum’s bench and bench-serve commands.
Locked by docs/bench/PLAYBOOK.md § 7. Do not invent variants;
producers and consumers (bench, bench-serve, compare-commits,
visualizer, dashboards) all build against the types here.
§Quick map
BenchReport— top-level: one bench cell, aggregated acrossn_repeatsScenario— closed-loop / open-loop / shared-prefix / cliMetricSet— p50/p75/p95/p99 of one latency metricScalarStats—{mean, stddev, ci95_hw}(statsmodule)Env+EnvHash— apples-to-apples cell identity ([env] module)ProfileEvent— locked structured profile JSONL envelope (profilemodule)compute_metrics— the one aggregator both bench CLIs callarrivalsmodule — Poisson inter-arrival times for open-loop
§Determinism notes
- JSON keys are emitted in struct field-declaration order; field order is part of the locked schema and should not change.
BTreeMap(notHashMap) for any dynamic key-value bag.- CI95 fields are suppressed when
n_repeats < 3(degenerate).
Re-exports§
pub use env::Env;pub use env::EnvHash;pub use profile::configure_global_profile;pub use profile::flush_global_profile;pub use profile::global_profile;pub use profile::parse_profile_event_value;pub use profile::parse_profile_jsonl_str;pub use profile::profile_fields_from_json;pub use profile::ProfileEvent;pub use profile::ProfileJsonlWriter;pub use profile::ProfileMetadata;pub use profile::ProfileSinkConfig;pub use stats::ci95_half_width;pub use stats::percentile;pub use stats::student_t_975;pub use stats::PercentileStats;pub use stats::ScalarStats;
Modules§
- arrivals
- Poisson arrival-time generation for open-loop benchmarking.
- env
- Bench environment snapshot — hardware + software + config — and
the SHA-256
env_hashused bycompare-commits.shand similar to filter “apples-to-apples” cells. - profile
- Structured profile event schema shared by benchmark runners and consumers.
- report
- Markdown report generation for bench cells (PLAYBOOK § 2.4 + § 7).
- stats
- Statistical aggregates used in
BenchReport: percentile (linear interpolation),ScalarStats(mean / stddev / CI95), and Student-t critical values for small-sample CI. - trace
- Chrome Trace Event JSON emission — PLAYBOOK § Phase 1.5.
Structs§
- Bench
Report - One bench cell —
n_repeatsindependent runs aggregated. - Metric
Set - Four percentile points for a single latency metric. Each point is a
ScalarStatsaggregate acrossn_repeatsruns. - Quality
Issue Counts - Request
Record - One request’s measurements (input to
compute_metrics). - RunRecord
- One independent run of the bench workload.
- Slo
- SLO thresholds applied when computing goodput. All in milliseconds.
- Token
Length Stats
Enums§
- Output
Token Count Source - Scenario
- Locked enum of bench scenarios — see
docs/bench/PLAYBOOK.md§ 2.
Functions§
- compute_
metrics - Aggregate
n_repeatsindependent runs into oneBenchReport.