Crate ferrum_bench_core

Expand description

ferrum-bench-core — canonical schema, metric aggregation, and variance reporting for ferrum’s bench and bench-serve commands.

Locked by docs/bench/PLAYBOOK.md § 7. Do not invent variants; producers and consumers (bench, bench-serve, compare-commits, visualizer, dashboards) all build against the types here.

§Quick map

BenchReport — top-level: one bench cell, aggregated across n_repeats
Scenario — closed-loop / open-loop / shared-prefix / cli
MetricSet — p50/p75/p95/p99 of one latency metric
ScalarStats — {mean, stddev, ci95_hw} (stats module)
Env + EnvHash — apples-to-apples cell identity ([env] module)
ProfileEvent — locked structured profile JSONL envelope (profile module)
compute_metrics — the one aggregator both bench CLIs call
arrivals module — Poisson inter-arrival times for open-loop

§Determinism notes

JSON keys are emitted in struct field-declaration order; field order is part of the locked schema and should not change.
BTreeMap (not HashMap) for any dynamic key-value bag.
CI95 fields are suppressed when n_repeats < 3 (degenerate).

Re-exports§

pub use env::Env;
pub use env::EnvHash;
pub use profile::configure_global_profile;
pub use profile::flush_global_profile;
pub use profile::global_profile;
pub use profile::parse_profile_event_value;
pub use profile::parse_profile_jsonl_str;
pub use profile::profile_fields_from_json;
pub use profile::ProfileEvent;
pub use profile::ProfileJsonlWriter;
pub use profile::ProfileMetadata;
pub use profile::ProfileSinkConfig;
pub use stats::ci95_half_width;
pub use stats::percentile;
pub use stats::student_t_975;
pub use stats::PercentileStats;
pub use stats::ScalarStats;

Modules§

arrivals: Poisson arrival-time generation for open-loop benchmarking.
env: Bench environment snapshot — hardware + software + config — and the SHA-256 env_hash used by compare-commits.sh and similar to filter “apples-to-apples” cells.
profile: Structured profile event schema shared by benchmark runners and consumers.
report: Markdown report generation for bench cells (PLAYBOOK § 2.4 + § 7).
stats: Statistical aggregates used in BenchReport: percentile (linear interpolation), ScalarStats (mean / stddev / CI95), and Student-t critical values for small-sample CI.
trace: Chrome Trace Event JSON emission — PLAYBOOK § Phase 1.5.

Structs§

BenchReport: One bench cell — n_repeats independent runs aggregated.
MetricSet: Four percentile points for a single latency metric. Each point is a ScalarStats aggregate across n_repeats runs.
QualityIssueCounts
RequestRecord: One request’s measurements (input to compute_metrics).
RunRecord: One independent run of the bench workload.
Slo: SLO thresholds applied when computing goodput. All in milliseconds.
TokenLengthStats

Enums§

OutputTokenCountSource
Scenario: Locked enum of bench scenarios — see docs/bench/PLAYBOOK.md § 2.

Functions§

compute_metrics: Aggregate n_repeats independent runs into one BenchReport.

Crate ferrum_bench_core

Crate ferrum_bench_core Copy item path

§Quick map

§Determinism notes

Re-exports§

Modules§

Structs§

Enums§

Functions§

Crate ferrum_bench_core