Skip to main content

Module eval

Module eval 

Source
Expand description

Eval framework — agent UX friction testing.

Defines task scenarios that evaluate how well agents can use maw without knowledge of git internals. Each scenario has preconditions, a plain-English task prompt, expected outcomes, and a scoring rubric.

Modules§

scenarios
Agent task scenarios and scoring rubric for UX friction evaluation.