Skip to main content

Module baselines

Module baselines 

Source
Expand description

Performance baselines for regression detection.

This module defines expected performance characteristics based on measurements from the benchmark suite. Tests can compare against these baselines to detect regressions.

§Baseline Measurements

Current measurements (as of 2026-02-13):

  • Execution history growth: ~53 bytes per iteration (bounded at 1000 entries)
  • Checkpoint size: ~363 KB for 1000 entries (well under 2048 KB hard limit)
  • Memory usage: Bounded growth verified by integration tests

§Regression Detection Strategy

  1. CI runs cargo xtask verify on every commit

    • Fails if execution history exceeds 1000 entries (hard limit)
    • Fails if checkpoint size exceeds 2048 KB (hard limit)
    • Fails if thread cleanup doesn’t complete (timeout detection)
  2. Benchmark tests capture current values for trending

    • Run with: cargo test -p ralph-workflow --lib benchmarks -- --nocapture
    • Values are informational; baselines have generous tolerance
  3. Integration tests enforce behavioral invariants

    • Bounded growth: tests/integration_tests/memory_safety/bounded_growth.rs
    • Thread cleanup: tests/integration_tests/memory_safety/thread_lifecycle.rs
    • Arc patterns: tests/integration_tests/memory_safety/arc_patterns.rs
  4. Tolerance rationale

    • Memory baselines: 20% tolerance (accounts for platform variance)
    • Time baselines: 2x tolerance (serialization performance varies widely)
    • Hard limits: 0% tolerance (prevent unbounded growth)

Structs§

CheckpointSerializationBaseline
Checkpoint serialization performance baseline.
ExecutionHistoryBaseline
Performance baseline for execution history growth.

Functions§

estimate_execution_history_heap_bytes_core_fields
Estimate heap bytes for an execution history slice.
estimate_execution_step_heap_bytes_core_fields
Estimate heap bytes for a single execution step using the same methodology as the benchmark suite and baselines.