Expand description
Replay-oracle evaluation harness (ClawVM §3).
§Role
ClawVM proposes a replay oracle to separate policy quality from
budget insufficiency: given a recorded trace and the same budget,
an oracle with bounded future lookahead h picks representations
that minimise faults. The online-minus-oracle fault gap then
measures headroom vs. unavoidable workload pressure.
§Scope (Phase B step 22)
This module delivers the evaluation primitive without wiring it
into the live prompt loop (it is intentionally offline). It takes
a recorded Vec<Message> plus a horizon h and produces a
per-turn demand trace — which prior messages get referenced within
the next h turns. That trace is the “ground truth” the Pareto
eval harness can grade any [DerivePolicy] against.
Demand signal today is a conservative lexical heuristic: a turn at
index t is said to “need” turn u (with u < t) when:
- The body of
tmentions a file path that first appeared inu. - A
ToolResultattmatches aToolCallid introduced atu.
Both signals are overly conservative — the real oracle can see model-internal attention — but they are faithful enough to seed the oracle-gap metric and to motivate future refinement.
§Invariants
- Pure and offline: no provider, RLM, or filesystem IO.
- Deterministic: same input → same output.
- The full message vector is borrowed read-only; no mutation.
§Examples
use codetether_agent::provider::{ContentPart, Message, Role};
use codetether_agent::session::oracle::{replay_oracle, OracleReport};
let msgs = vec![
Message {
role: Role::User,
content: vec![ContentPart::Text { text: "edit src/lib.rs".into() }],
},
Message {
role: Role::Assistant,
content: vec![ContentPart::ToolCall {
id: "call-1".into(),
name: "Shell".into(),
arguments: "{}".into(),
thought_signature: None,
}],
},
Message {
role: Role::Tool,
content: vec![ContentPart::ToolResult {
tool_call_id: "call-1".into(),
content: "ok".into(),
}],
},
];
let report: OracleReport = replay_oracle(&msgs, 2);
assert_eq!(report.demand.len(), msgs.len());Structs§
- Oracle
Report - Summary of an oracle replay over a recorded trace.
Functions§
- replay_
oracle - Compute an
OracleReportovermessageswith horizonh.