reasoning-parser 1.1.0

Parser for AI model reasoning/thinking outputs (chain-of-thought, etc.)
Documentation

reasoning-parser

A Rust library for detecting and extracting reasoning content (chain-of-thought) from Large Language Model outputs. Handles models that emit explicit thinking blocks delimited by tokens like <think> and </think>.

Features

  • Unified Interface - Single API for multiple model formats
  • Streaming Support - Incremental parsing with state preservation across chunks
  • Parser Pooling - Efficient reuse of parser instances for high concurrency
  • Partial Token Handling - Correctly handles tokens split across chunk boundaries
  • Model Auto-Detection - Pattern-based automatic parser selection
  • Extensible - Easy to add support for new model formats

Installation

Add to your Cargo.toml:

[dependencies]
reasoning-parser = "1.0"

Quick Start

use reasoning_parser::{ParserFactory, ReasoningParser};

#[tokio::main]
async fn main() {
    let factory = ParserFactory::new();
    let parser = factory.get_pooled("deepseek-r1");

    let mut p = parser.lock().await;
    let result = p
        .detect_and_parse_reasoning("<think>Let me analyze this...</think>The answer is 42.")
        .unwrap();

    println!("Reasoning: {}", result.reasoning_text);  // "Let me analyze this..."
    println!("Answer: {}", result.normal_text);        // "The answer is 42."
}

Supported Models

Model Token Format Notes
DeepSeek-R1 <think>/</think> Starts in reasoning mode
Qwen3 <think>/</think> Explicit reasoning blocks
Qwen3-Thinking <think>/</think> Starts in reasoning mode
GLM-4.5/4.6/4.7 <think>/</think> Explicit reasoning blocks
Kimi ◁think▷/◁/think▷ Unicode delimiters
Step3 <think>/</think> Starts in reasoning mode
MiniMax M2 <think>/</think> Auto-prepends start token
Cohere Command <|START_THINKING|>/<|END_THINKING|> CMD3/CMD4 format
Nemotron-Nano <think>/</think> Qwen3-compatible

Unknown models fall back to a passthrough parser that returns all text as normal output.

Core Types

ParserResult

The result of parsing, separating reasoning from normal text:

pub struct ParserResult {
    pub normal_text: String,    // Text outside reasoning blocks
    pub reasoning_text: String, // Text inside reasoning blocks
}

ReasoningParser Trait

The core interface all parsers implement:

pub trait ReasoningParser: Send + Sync {
    /// One-shot parsing of complete text
    fn detect_and_parse_reasoning(&mut self, text: &str) -> Result<ParserResult, ParseError>;

    /// Streaming incremental parsing
    fn parse_reasoning_streaming_incremental(&mut self, text: &str) -> Result<ParserResult, ParseError>;

    /// Reset parser state for reuse
    fn reset(&mut self);

    /// Get parser variant identifier
    fn model_type(&self) -> &str;

    /// Check if currently parsing reasoning content
    fn is_in_reasoning(&self) -> bool;
}

Usage Patterns

One-Shot Parsing

For complete text that doesn't need streaming:

let factory = ParserFactory::new();
let mut parser = factory.create("qwen3").unwrap();

let input = "<think>Step 1: Consider the problem...</think>The solution is X.";
let result = parser.detect_and_parse_reasoning(input).unwrap();

assert_eq!(result.reasoning_text, "Step 1: Consider the problem...");
assert_eq!(result.normal_text, "The solution is X.");

Streaming Parsing

For processing chunks as they arrive from an LLM:

let factory = ParserFactory::new();
let parser = factory.get_pooled("deepseek-r1");

let chunks = vec![
    "<think>Let me ",
    "think about this",
    "</think>Here's ",
    "the answer.",
];

let mut p = parser.lock().await;
for chunk in chunks {
    let result = p.parse_reasoning_streaming_incremental(chunk).unwrap();

    if !result.reasoning_text.is_empty() {
        print!("[reasoning] {}", result.reasoning_text);
    }
    if !result.normal_text.is_empty() {
        print!("{}", result.normal_text);
    }
}

Parser Reuse

Reset a parser to process a new request:

let parser = factory.get_pooled("qwen3");
let mut p = parser.lock().await;

// First request
let result1 = p.detect_and_parse_reasoning("<think>A</think>B").unwrap();

// Reset for next request
p.reset();

// Second request
let result2 = p.detect_and_parse_reasoning("<think>C</think>D").unwrap();

Pooled vs Fresh Parsers

// Pooled: shared instance, requires lock, efficient for high concurrency
let pooled = factory.get_pooled("deepseek-r1");  // Arc<Mutex<Box<dyn ReasoningParser>>>

// Fresh: new instance each time, no lock needed
let fresh = factory.create("deepseek-r1").unwrap();  // Box<dyn ReasoningParser>

Custom Parser Configuration

Create a parser with custom tokens:

use reasoning_parser::{BaseReasoningParser, ParserConfig, ReasoningParser};

let config = ParserConfig {
    think_start_token: "<reasoning>".to_string(),
    think_end_token: "</reasoning>".to_string(),
    stream_reasoning: true,
    max_buffer_size: 65536,
    initial_in_reasoning: false,
};

let mut parser = BaseReasoningParser::new(config);
let result = parser
    .detect_and_parse_reasoning("<reasoning>thinking</reasoning>answer")
    .unwrap();

Registering Custom Parsers

Add support for new model patterns:

let factory = ParserFactory::new();

// Register a creator function
factory.registry().register_parser("myformat", || {
    Box::new(BaseReasoningParser::new(ParserConfig {
        think_start_token: "<<THINK>>".to_string(),
        think_end_token: "<</THINK>>".to_string(),
        stream_reasoning: true,
        max_buffer_size: 65536,
        initial_in_reasoning: false,
    }))
});

// Map model patterns to the parser
factory.registry().register_pattern("my-custom-model", "myformat");
factory.registry().register_pattern("my-model-v2", "myformat");

// Now these work
let parser = factory.get_pooled("my-custom-model-7b");

Error Handling

use reasoning_parser::ParseError;

match parser.detect_and_parse_reasoning(text) {
    Ok(result) => {
        println!("Reasoning: {}", result.reasoning_text);
        println!("Normal: {}", result.normal_text);
    }
    Err(ParseError::BufferOverflow(size)) => {
        eprintln!("Content too large: {} bytes", size);
    }
    Err(ParseError::Utf8Error(e)) => {
        eprintln!("Invalid UTF-8: {}", e);
    }
    Err(ParseError::UnknownModel(model)) => {
        eprintln!("Unknown model: {}", model);
    }
    Err(ParseError::ConfigError(msg)) => {
        eprintln!("Configuration error: {}", msg);
    }
}

Model Pattern Matching

The factory uses case-insensitive substring matching:

// All of these match "deepseek-r1" pattern:
factory.get_pooled("deepseek-r1");
factory.get_pooled("DeepSeek-R1-Distill-Qwen-7B");
factory.get_pooled("my-deepseek-r1-finetune");

Pattern priority (first match wins):

  1. deepseek-r1 → DeepSeekR1Parser
  2. qwen3-thinking / qwen-thinking → QwenThinkingParser
  3. qwen3 / qwen → Qwen3Parser
  4. glm45 / glm46 / glm47 → Glm45Parser
  5. kimi → KimiParser
  6. step3 → Step3Parser
  7. minimax / mm-m2 → MiniMaxParser
  8. command-r / command-a / c4ai-command / cohere → CohereCmdParser
  9. nemotron-nano / nano-v3 → Qwen3Parser
  10. (fallback) → BaseReasoningParser (passthrough)

Thread Safety

The crate is designed for high-concurrency scenarios:

  • PooledParser type is Arc<Mutex<Box<dyn ReasoningParser>>>
  • Uses tokio::Mutex for async-friendly locking
  • Registry uses Arc<RwLock<>> for safe concurrent access
  • Tested with 100 concurrent tasks at 1000+ requests/second

License

Apache-2.0