Expand description
Performance baselines for regression detection.
This module defines expected performance characteristics based on measurements from the benchmark suite. Tests can compare against these baselines to detect regressions.
§Baseline Measurements
Current measurements (as of 2026-02-13):
- Execution history growth: ~53 bytes per iteration (bounded at 1000 entries)
- Checkpoint size: ~363 KB for 1000 entries (well under 2048 KB hard limit)
- Memory usage: Bounded growth verified by integration tests
§Regression Detection Strategy
-
CI runs
cargo xtask verifyon every commit- Fails if execution history exceeds 1000 entries (hard limit)
- Fails if checkpoint size exceeds 2048 KB (hard limit)
- Fails if thread cleanup doesn’t complete (timeout detection)
-
Benchmark tests capture current values for trending
- Run with:
cargo test -p ralph-workflow --lib benchmarks -- --nocapture - Values are informational; baselines have generous tolerance
- Run with:
-
Integration tests enforce behavioral invariants
- Bounded growth:
tests/integration_tests/memory_safety/bounded_growth.rs - Thread cleanup:
tests/integration_tests/memory_safety/thread_lifecycle.rs - Arc patterns:
tests/integration_tests/memory_safety/arc_patterns.rs
- Bounded growth:
-
Tolerance rationale
- Memory baselines: 20% tolerance (accounts for platform variance)
- Time baselines: 2x tolerance (serialization performance varies widely)
- Hard limits: 0% tolerance (prevent unbounded growth)
Structs§
- Checkpoint
Serialization Baseline - Checkpoint serialization performance baseline.
- Execution
History Baseline - Performance baseline for execution history growth.
Functions§
- estimate_
execution_ history_ heap_ bytes_ core_ fields - Estimate heap bytes for an execution history slice.
- estimate_
execution_ step_ heap_ bytes_ core_ fields - Estimate heap bytes for a single execution step using the same methodology as the benchmark suite and baselines.