Expand description
§Reasoning Quality Metrics
Core metrics for evaluating AI THINKING improvement with ReasonKit.
This is the HEART of ReasonKit evaluation - measuring whether ThinkTool protocols actually improve AI reasoning.
Structs§
- BedRock
Metrics - BedRock-specific metrics
- Benchmark
Result - Result from running a benchmark
- Brutal
Honesty Metrics - BrutalHonesty-specific metrics
- Calibration
Metrics - Calibration metrics (confidence vs accuracy)
- Consistency
Metrics - Self-consistency metrics
- Giga
Think Metrics - GigaThink-specific metrics
- Laser
Logic Metrics - LaserLogic-specific metrics
- Proof
Guard Metrics - ProofGuard-specific metrics
- Question
Result - Result for a single question
- Reasoning
Metrics - Aggregated reasoning metrics
- Think
Tool Metrics - Generic ThinkTool effectiveness metrics
Enums§
Functions§
- calculate_
thinktool_ delta - Calculate improvement delta for a ThinkTool
- is_
significant - Statistical significance test (simplified)