Module prm

Module prm 

Source
Expand description

§Process Reward Model (PRM) for Step-by-Step Verification

Implements step-level verification based on Math-Shepherd research achieving +6.2% GSM8K improvement through granular reasoning validation.

§Scientific Foundation

Based on:

  • Math-Shepherd (Wang et al., 2024): Process reward models for math reasoning
  • Let’s Verify Step by Step (Lightman et al., 2023): Step-level human verification

§Key Concepts

  • Outcome Reward Model (ORM): Scores only final answer correctness
  • Process Reward Model (PRM): Scores each reasoning step independently

PRM advantages:

  1. Better credit assignment - identifies WHERE reasoning went wrong
  2. More training signal - learns from partial success
  3. Improved calibration - confidence per step

§Usage

use reasonkit::thinktool::prm::{ProcessRewardModel, StepScore};

let prm = ProcessRewardModel::new();
let steps = vec!["Step 1: Given x + 2 = 5", "Step 2: x = 5 - 2 = 3"];
let scores = prm.score_steps(&steps).await?;

Structs§

PrmConfig
Process Reward Model configuration
PrmMetrics
PrmReranker
Best-of-N with PRM reranking
PrmResult
Result of PRM evaluation
StepIssue
Issue identified in a reasoning step
StepParser
Step parser to extract reasoning steps from LLM output
StepScore
Individual step score from PRM
VerificationPrompts
Verification prompt templates for different step types

Enums§

IssueType
ScoreAggregation
Severity
VerificationStrategy