Module prm

Expand description

§Process Reward Model (PRM) for Step-by-Step Verification

Implements step-level verification based on Math-Shepherd research achieving +6.2% GSM8K improvement through granular reasoning validation.

§Scientific Foundation

Based on:

Math-Shepherd (Wang et al., 2024): Process reward models for math reasoning
Let’s Verify Step by Step (Lightman et al., 2023): Step-level human verification

§Key Concepts

Outcome Reward Model (ORM): Scores only final answer correctness
Process Reward Model (PRM): Scores each reasoning step independently

PRM advantages:

Better credit assignment - identifies WHERE reasoning went wrong
More training signal - learns from partial success
Improved calibration - confidence per step

§Usage

use reasonkit::thinktool::prm::{ProcessRewardModel, StepScore};

let prm = ProcessRewardModel::new();
let steps = vec!["Step 1: Given x + 2 = 5", "Step 2: x = 5 - 2 = 3"];
let scores = prm.score_steps(&steps).await?;

Structs§

PrmConfig: Process Reward Model configuration
PrmMetrics
PrmReranker: Best-of-N with PRM reranking
PrmResult: Result of PRM evaluation
StepIssue: Issue identified in a reasoning step
StepParser: Step parser to extract reasoning steps from LLM output
StepScore: Individual step score from PRM
VerificationPrompts: Verification prompt templates for different step types

Enums§

IssueType
ScoreAggregation
Severity
VerificationStrategy

Module prm

Module prm Copy item path

§Process Reward Model (PRM) for Step-by-Step Verification

§Scientific Foundation

§Key Concepts

§Usage

Structs§

Enums§

Module prm