Expand description
§Process Reward Model (PRM) for Step-by-Step Verification
Implements step-level verification based on Math-Shepherd research achieving +6.2% GSM8K improvement through granular reasoning validation.
§Scientific Foundation
Based on:
- Math-Shepherd (Wang et al., 2024): Process reward models for math reasoning
- Let’s Verify Step by Step (Lightman et al., 2023): Step-level human verification
§Key Concepts
- Outcome Reward Model (ORM): Scores only final answer correctness
- Process Reward Model (PRM): Scores each reasoning step independently
PRM advantages:
- Better credit assignment - identifies WHERE reasoning went wrong
- More training signal - learns from partial success
- Improved calibration - confidence per step
§Usage
ⓘ
use reasonkit::thinktool::prm::{ProcessRewardModel, StepScore};
let prm = ProcessRewardModel::new();
let steps = vec!["Step 1: Given x + 2 = 5", "Step 2: x = 5 - 2 = 3"];
let scores = prm.score_steps(&steps).await?;Structs§
- PrmConfig
- Process Reward Model configuration
- PrmMetrics
- PrmReranker
- Best-of-N with PRM reranking
- PrmResult
- Result of PRM evaluation
- Step
Issue - Issue identified in a reasoning step
- Step
Parser - Step parser to extract reasoning steps from LLM output
- Step
Score - Individual step score from PRM
- Verification
Prompts - Verification prompt templates for different step types