reasoning-parser
A Rust library for detecting and extracting reasoning content (chain-of-thought) from Large Language Model outputs. Handles models that emit explicit thinking blocks delimited by tokens like <think> and </think>.
Features
- Unified Interface - Single API for multiple model formats
- Streaming Support - Incremental parsing with state preservation across chunks
- Parser Pooling - Efficient reuse of parser instances for high concurrency
- Partial Token Handling - Correctly handles tokens split across chunk boundaries
- Model Auto-Detection - Pattern-based automatic parser selection
- Extensible - Easy to add support for new model formats
Installation
Add to your Cargo.toml:
[]
= "1.0"
Quick Start
use ;
async
Supported Models
| Model | Token Format | Notes |
|---|---|---|
| DeepSeek-R1 | <think>/</think> |
Starts in reasoning mode |
| Qwen3 | <think>/</think> |
Explicit reasoning blocks |
| Qwen3-Thinking | <think>/</think> |
Starts in reasoning mode |
| GLM-4.5/4.6/4.7 | <think>/</think> |
Explicit reasoning blocks |
| Kimi | ◁think▷/◁/think▷ |
Unicode delimiters |
| Step3 | <think>/</think> |
Starts in reasoning mode |
| MiniMax M2 | <think>/</think> |
Auto-prepends start token |
| Cohere Command | <|START_THINKING|>/<|END_THINKING|> |
CMD3/CMD4 format |
| Nemotron-Nano | <think>/</think> |
Qwen3-compatible |
Unknown models fall back to a passthrough parser that returns all text as normal output.
Core Types
ParserResult
The result of parsing, separating reasoning from normal text:
ReasoningParser Trait
The core interface all parsers implement:
Usage Patterns
One-Shot Parsing
For complete text that doesn't need streaming:
let factory = new;
let mut parser = factory.create.unwrap;
let input = "<think>Step 1: Consider the problem...</think>The solution is X.";
let result = parser.detect_and_parse_reasoning.unwrap;
assert_eq!;
assert_eq!;
Streaming Parsing
For processing chunks as they arrive from an LLM:
let factory = new;
let parser = factory.get_pooled;
let chunks = vec!;
let mut p = parser.lock.await;
for chunk in chunks
Parser Reuse
Reset a parser to process a new request:
let parser = factory.get_pooled;
let mut p = parser.lock.await;
// First request
let result1 = p.detect_and_parse_reasoning.unwrap;
// Reset for next request
p.reset;
// Second request
let result2 = p.detect_and_parse_reasoning.unwrap;
Pooled vs Fresh Parsers
// Pooled: shared instance, requires lock, efficient for high concurrency
let pooled = factory.get_pooled; // Arc<Mutex<Box<dyn ReasoningParser>>>
// Fresh: new instance each time, no lock needed
let fresh = factory.create.unwrap; // Box<dyn ReasoningParser>
Custom Parser Configuration
Create a parser with custom tokens:
use ;
let config = ParserConfig ;
let mut parser = new;
let result = parser
.detect_and_parse_reasoning
.unwrap;
Registering Custom Parsers
Add support for new model patterns:
let factory = new;
// Register a creator function
factory.registry.register_parser;
// Map model patterns to the parser
factory.registry.register_pattern;
factory.registry.register_pattern;
// Now these work
let parser = factory.get_pooled;
Error Handling
use ParseError;
match parser.detect_and_parse_reasoning
Model Pattern Matching
The factory uses case-insensitive substring matching:
// All of these match "deepseek-r1" pattern:
factory.get_pooled;
factory.get_pooled;
factory.get_pooled;
Pattern priority (first match wins):
deepseek-r1→ DeepSeekR1Parserqwen3-thinking/qwen-thinking→ QwenThinkingParserqwen3/qwen→ Qwen3Parserglm45/glm46/glm47→ Glm45Parserkimi→ KimiParserstep3→ Step3Parserminimax/mm-m2→ MiniMaxParsercommand-r/command-a/c4ai-command/cohere→ CohereCmdParsernemotron-nano/nano-v3→ Qwen3Parser- (fallback) → BaseReasoningParser (passthrough)
Thread Safety
The crate is designed for high-concurrency scenarios:
PooledParsertype isArc<Mutex<Box<dyn ReasoningParser>>>- Uses
tokio::Mutexfor async-friendly locking - Registry uses
Arc<RwLock<>>for safe concurrent access - Tested with 100 concurrent tasks at 1000+ requests/second
License
Apache-2.0