Module conformance

Expand description

Axis 9: schema / format conformance rate.

Intent-gated on the baseline side. A pair is counted (and scored) if the baseline response has EITHER:

JSON-text intent: its text starts with { or [ (after fence- strip). Both sides are scored on whether their text parses.
Tool-use intent: it emits at least one tool_use block with a dict-shaped input. Both sides are scored on whether their (first) tool_use’s input keys match the baseline’s. This covers the common agent pattern where the “structured final answer” is a tool call (e.g. submit_answer(...)) rather than JSON text — the same conformance question in a different syntax.

Pairs where baseline has neither JSON intent nor tool_use intent are excluded (we don’t penalise the candidate for not structuring when nothing asked for structure).

Functions§

compute: Compute the schema-conformance axis.

Module conformance

Module conformance Copy item path

Functions§

Module conformance