Module oracle

Expand description

Replay-oracle evaluation harness (ClawVM §3).

§Role

ClawVM proposes a replay oracle to separate policy quality from budget insufficiency: given a recorded trace and the same budget, an oracle with bounded future lookahead h picks representations that minimise faults. The online-minus-oracle fault gap then measures headroom vs. unavoidable workload pressure.

§Scope (Phase B step 22)

This module delivers the evaluation primitive without wiring it into the live prompt loop (it is intentionally offline). It takes a recorded Vec<Message> plus a horizon h and produces a per-turn demand trace — which prior messages get referenced within the next h turns. That trace is the “ground truth” the Pareto eval harness can grade any [DerivePolicy] against.

Demand signal today is a conservative lexical heuristic: a turn at index t is said to “need” turn u (with u < t) when:

The body of t mentions a file path that first appeared in u.
A ToolResult at t matches a ToolCall id introduced at u.

Both signals are overly conservative — the real oracle can see model-internal attention — but they are faithful enough to seed the oracle-gap metric and to motivate future refinement.

§Invariants

Pure and offline: no provider, RLM, or filesystem IO.
Deterministic: same input → same output.
The full message vector is borrowed read-only; no mutation.

§Examples

use codetether_agent::provider::{ContentPart, Message, Role};
use codetether_agent::session::oracle::{replay_oracle, OracleReport};

let msgs = vec![
    Message {
        role: Role::User,
        content: vec![ContentPart::Text { text: "edit src/lib.rs".into() }],
    },
    Message {
        role: Role::Assistant,
        content: vec![ContentPart::ToolCall {
            id: "call-1".into(),
            name: "Shell".into(),
            arguments: "{}".into(),
            thought_signature: None,
        }],
    },
    Message {
        role: Role::Tool,
        content: vec![ContentPart::ToolResult {
            tool_call_id: "call-1".into(),
            content: "ok".into(),
        }],
    },
];
let report: OracleReport = replay_oracle(&msgs, 2);
assert_eq!(report.demand.len(), msgs.len());

Structs§

OracleReport: Summary of an oracle replay over a recorded trace.

Functions§

replay_oracle: Compute an OracleReport over messages with horizon h.

Module oracle

Module oracle Copy item path

§Role

§Scope (Phase B step 22)

§Invariants

§Examples

Structs§

Functions§

Module oracle