Parsanol-rs
A high-performance PEG (Parsing Expression Grammar) parser library for Rust with packrat memoization and arena allocation.
Purpose
Parsanol-rs is a generic, domain-agnostic PEG parser library written in Rust. It provides high-performance parsing capabilities with a focus on:
-
Speed: Packrat memoization for O(n) parsing complexity
-
Memory efficiency: Arena allocation for zero-copy AST construction
-
Developer experience: Fluent API for building grammars, rich error reporting
-
Flexibility: Transform system for converting parse trees to typed Rust structs via derive macros
Features
- Quick Start - Get started in minutes
- Backend Abstraction - Extensible backend trait system
- Bytecode Backend - Optional VM backend for linear patterns
- Parser DSL - Fluent API for grammar definition
- Capture Atoms - Extract named values during parsing
- Scope Atoms - Isolated capture contexts
- Dynamic Atoms - Runtime-determined parsing via callbacks
- Streaming with Captures - Memory-efficient parsing with capture support
- Transform System - Convert parse trees to typed structs
- Derive Macros - Automatic typed AST generation
- Streaming Builder - Single-pass parsing with custom output
- Parallel Parsing - Multi-file parsing with rayon
- Infix Expression Parsing - Built-in operator precedence
- Rich Error Reporting - Tree-structured error messages
- Source Location Tracking - Line/column tracking through transforms
- Grammar Composition - Import and compose grammars
- Ruby FFI - Optional Ruby bindings
- WASM Support - Optional WebAssembly bindings
Bytecode Backend
Parsanol-rs supports two parsing backends:
- Packrat (default): Memoization-based parser with O(n) time complexity for all grammars
- Bytecode VM: Stack-based virtual machine with optimization passes
Backend Comparison
Both backends produce identical parsing results for all valid inputs. The difference lies in performance characteristics:
| Aspect | Packrat | Bytecode VM |
|---|---|---|
| Time Complexity | Guaranteed O(n) | O(n) to O(2^n) depending on grammar |
| Memory Usage | Higher (memoization table) | Lower (stack-based) |
| Compilation | None required | Pre-compilation needed |
| Nested Repetitions | Handles efficiently | Can be exponential |
| Simple Patterns | Good | Excellent |
| Predictability | Consistent performance | Varies by grammar |
Performance Characteristics
Packrat Backend:
- Memoization stores parse results at each position
- Guarantees O(n) time complexity regardless of grammar structure
- Memory overhead scales with input size and grammar complexity
- Ideal when predictable performance is required
Bytecode VM Backend:
- Stack-based execution with backtracking
- O(n) for linear patterns (most common case)
- Can exhibit O(2^n) behavior for pathological patterns like
(a*)* - Lower memory footprint, good for memory-constrained environments
- Pre-compilation enables optimization passes
Decision Matrix
| Grammar Type | Recommended Backend | Reason |
|---|---|---|
| JSON, XML, config files | Either | Linear patterns, both perform well |
| Programming languages | Packrat | Complex grammar with nested structures |
| Log parsing | Bytecode | Simple patterns, streaming potential |
Nested repetitions (a*)* |
Packrat | Avoids exponential backtracking |
| Memory-constrained | Bytecode | Lower memory footprint |
| Need predictable O(n) | Packrat | Guaranteed linear time |
Automatic Selection
Use Backend::Auto (the default) to let parsanol analyze your grammar:
// Automatic selection (default)
let mut parser = auto;
// Or explicitly:
let mut parser = new;
// Check the analysis
let analysis = parser.analysis;
println!;
println!;
Why Nested Repetitions Are the Criterion
The backend selection is based on a single hard rule:
- Has nested repetitions (e.g.,
(a*)*) → Packrat - Otherwise → Bytecode
This is the only criterion because nested repetitions are the only pattern that causes exponential time complexity in the bytecode backend. Here's why:
The Algorithmic Problem:
When a repetition contains another repetition, the parser must try all possible ways to divide the input. For pattern (a*)* on input "aaa":
Division 1: (aaa) - outer * matches 1 group
Division 2: (aa)(a) - outer * matches 2 groups
Division 3: (a)(aa) - outer * matches 2 groups (different split)
Division 4: (a)(a)(a) - outer * matches 3 groups
... and so on
The number of ways to partition n characters is O(2^n). The bytecode VM tries each possibility via backtracking, leading to exponential time.
Why Packrat Solves It:
Packrat memoizes results by (position, rule). Once (a*) is evaluated at position i, the result is cached. Subsequent evaluations at the same position are O(1) cache hits. This guarantees O(n) total time.
Why Other Patterns Don't Matter:
| Pattern | Time Impact | Backend Difference |
|---|---|---|
Overlapping choices ("a" | "aa") |
Linear backtracking | Both handle identically |
| Deep nesting | Stack depth increases | Both handle fine |
| Many alternatives | More choice points | Linear in alternative count |
| Left recursion | Infinite loop | Both fail - not a backend issue |
How the Analysis Works
The grammar analysis is deliberately simple:
The algorithm iterates through all atoms and checks: "Is this a Repetition whose inner atom is also a Repetition?"
for atom in &grammar.atoms
This is O(atoms) time and detects the only pattern that matters for backend selection.
When to Override Auto-Selection
The auto-selection only considers time complexity. You may want to manually select based on:
| Scenario | Manual Selection | Rationale |
|---|---|---|
| Memory-constrained (embedded, WASM) | Backend::Bytecode |
Lower memory: O(depth) vs O(n×rules) |
| Very large files (>100MB) | Backend::Bytecode |
Packrat table grows with input size |
| Predictable latency required | Backend::Packrat |
Guaranteed O(n), no pathological cases |
| Streaming parsing | Backend::Bytecode |
Packrat requires full input in memory |
| Incremental re-parsing | Backend::Packrat |
Memo table can be reused for unchanged portions |
| Grammar has nested repetitions but input is bounded | Either | If input is always small, exponential doesn't matter |
| Testing/debugging | Backend::Packrat |
Consistent behavior across all inputs |
// Memory-constrained environment
let mut parser = bytecode;
// Safety-critical with guaranteed O(n)
let mut parser = packrat;
// Explicit choice regardless of analysis
let mut parser = new;
Problematic Grammar Patterns
The following patterns can cause exponential O(2^n) behavior in the Bytecode backend.
They are safe with Packrat due to memoization. If your grammar contains these,
use Packrat explicitly or rely on Backend::Auto.
Critical Pattern: Nested Repetitions
(a*)* // CRITICAL: Outer * tries O(2^n) ways to divide input
(a+)+ // Same issue
((a|b)*)* // Even worse with choice
// Safe alternatives:
a* // Single repetition - O(n)
(a b)* // Fixed sequence inside - O(n)
Moderate Pattern: Overlapping Choice Prefixes
// Problematic: All start with 'a'
("a" | "aa" | "aaa")+
// Better: Distinct first characters
("a" | "b" | "c")+
Safe Pattern: Deep Recursion (Both handle well)
expr = term (("+" | "-") term)*
// Recursive but structured - both backends handle efficiently
Analyzing Your Grammar
Use the GrammarAnalysis API to check for nested repetitions:
use ;
GrammarAnalysis Fields:
| Field | Type | Purpose |
|---|---|---|
atom_count |
usize |
Number of atoms in grammar (informational) |
has_nested_repetition |
bool |
The criterion - if true, use Packrat |
The recommended_backend() Method:
Returns Backend::Packrat if has_nested_repetition is true, otherwise Backend::Bytecode. This is what Backend::Auto uses internally.
Using the Bytecode Backend
use ;
let grammar = new
.rule
.build;
// Create parser with bytecode backend
let mut parser = new;
let result = parser.parse;
// Or use auto-selection (analyzes grammar complexity)
let mut parser = auto;
let result = parser.parse;
Known Differences
Both backends produce identical results for the vast majority of patterns. However, there are edge cases where behavior differs:
Alternatives in Sequences: For patterns like ("a" | "aa") "b" on input "aab":
- Packrat: May succeed due to memoization re-evaluation
- Bytecode: Fails (standard PEG semantics - once "a" succeeds, "aa" is not tried)
This difference only affects patterns with:
- Alternatives containing overlapping prefixes ("a" vs "aa")
- The alternative is followed by content that fails
- The later alternative would allow the following content to succeed
For most practical grammars, this difference never manifests. Use Backend::Auto to let parsanol choose the appropriate backend.
Backend Abstraction
Parsanol provides a trait-based backend abstraction for extensibility. You can implement custom backends or use the built-in ones interchangeably.
Using the ParsingBackend Trait
use ;
// Use Packrat backend for predictable O(n) performance
let mut packrat = new;
let result = packrat.parse?;
// Use Bytecode backend for lower memory usage
let mut bytecode = new;
let result = bytecode.parse?;
// Configure backends
let packrat = new
.with_max_recursion_depth
.with_timeout_ms;
let bytecode = new
.with_auto_fallback; // Falls back to Packrat for complex grammars
Runtime Backend Selection
use Backend;
// Select backend at runtime
let backend_type = default_for_grammar;
match backend_type ;
Backend Characteristics
Each backend documents its performance characteristics:
use ;
let backend = new;
let chars = backend.characteristics;
println!; // "O(n)"
println!; // "O(n × r)"
println!; // true
println!; // false
println!; // true
println!; // true
Implementing Custom Backends
use ;
use Grammar;
;
Dynamic Backend Dispatch
For runtime polymorphism:
use ;
let mut backend: DynBackend = get_backend;
let result = backend.parse?;
Quick Start Examples
Using the bytecode backend explicitly:
use ;
let grammar = new
.rule
.build;
// Create parser with bytecode backend
let mut parser = new;
let result = parser.parse;
Using packrat backend explicitly:
let mut parser = new;
let result = parser.parse;
Optimization Passes
The bytecode backend applies 11 optimization passes automatically:
DeadCodeElimination- Remove unreachable codeJumpChainSimplification- Simplify jump chainsJumpToReturnSimplification- Direct returnsJumpToFailSimplification- Direct failuresCombineAdjacentChars- Char mergingSpanOptimization- CharSet* to SpanFullCaptureOptimization- Capture pairs to FullCaptureTestCharOptimization- Choice patterns to TestCharTestSetOptimization- Choice patterns to TestSetTailCallOptimization- Tail calls to jumpsLookaheadOptimization- Choice to PredChoice for predicates
Bytecode VM Architecture
Grammar (Atoms) ──► Compiler ──► Program (bytecode)
│
▼
Input ──────────────────────────► VM ──► AstNode
The bytecode VM uses:
- Backtracking stack: For choice point management
- Capture stack: For building AST nodes
- Instruction pointer: Sequential execution
- Optimization passes: Peephole optimization on compiled bytecode
Instruction Set
The VM supports 28 instructions covering all PEG operations:
| Category | Instructions |
|---|---|
| Matching | Char, CharSet, String, Regex, Any, Custom |
| Control Flow | Jump, Call, Return, End |
| Backtracking | Choice, Commit, PartialCommit, BackCommit, Fail, FailTwice |
| Captures | OpenCapture, CloseCapture, FullCapture |
| Tests | TestChar, TestSet, TestAny |
| Special | Behind, Span, NoOp, PredChoice |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ PARSANOL-RS │
│ (Generic PEG Parser Library) │
├─────────────────────────────────────────────────────────────┤
│ • Parser combinators (PEG atoms) │
│ • Grammar representation │
│ • Packrat memoization │
│ • Arena allocation │
│ • Infix expression parsing │
│ • Rich error reporting (tree structure) │
│ • Transform DSL (pattern matching) │
│ • Derive macros for typed ASTs │
│ • Optional Ruby FFI / WASM bindings │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ (build ON TOP) │ (build ON TOP)
│ │
┌─────────┴──────────┐ ┌─────────┴─────────┐
│ parsanol-express │ │ Your Language │
│ (EXPRESS lexer) │ │ (Your DSL) │
└────────────────────┘ └───────────────────┘
[!IMPORTANT] Parsanol-rs is a GENERIC parser library. It has no knowledge of any specific domain (EXPRESS, Ruby, JSON, YAML, etc.). Domain-specific parsers should be built ON TOP of this library.
Workspace Structure
This repository uses a Cargo workspace with two crates:
parsanol-rs/
├── parsanol/ # Main parser library
│ ├── src/
│ └── Cargo.toml
├── parsanol-derive/ # Derive macros (always included)
│ ├── src/
│ └── Cargo.toml
├── examples/ # 39 example parsers
└── Cargo.toml # Workspace root
Installation
Add this to your Cargo.toml:
[]
= "0.1"
The parsanol-derive crate is automatically included as a dependency,
providing the #[derive(FromAst)] macro for typed AST conversion.
Optional Features
-
ruby- Enable Ruby FFI bindings (requiresmagnus,rb-sys) -
wasm- Enable WebAssembly bindings (requireswasm-bindgen,js-sys) -
parallel- Enable parallel parsing (requiresrayon)
[]
= { = "0.1", = ["ruby", "parallel"] }
Quick Start
Basic Parsing
use ;
// Build a simple grammar
let grammar = new
.rule
.build;
let input = "helloworld";
let mut arena = for_input;
let mut parser = new;
match parser.parse
Calculator with Operator Precedence
use ;
Parser DSL
Atom Types
| Atom | Description | Example |
|---|---|---|
str("literal") |
Match exact string | str("hello") |
re("pattern") |
Match regex pattern | re(r"[0-9]+") |
any() |
Match any single character | any() |
ref_("rule") |
Reference to named rule | ref_("expr") |
seq([...]) |
Sequence of atoms | seq(vec![a, b, c]) |
choice([...]) |
Alternative atoms | choice(vec![a, b]) |
cut() |
Commit to this branch (prevent backtracking) | cut() |
capture("name", atom) |
Extract named value during parsing | capture("id", re(r"[a-z]+")) |
scope(atom) |
Create isolated capture context | scope(seq([...])) |
dynamic(callback) |
Runtime-determined parsing via callback | dynamic(callback_id) |
Combinators
All atoms implement the ParsletExt trait with these methods:
use *;
// Sequence: A >> B
let parser = str.then;
// Alternative: A | B
let parser = str.or;
// Repetition
let parser = str.repeat; // One or more
let parser = str.repeat; // Zero to three
let parser = str.many; // Zero or more
let parser = str.many1; // One or more
let parser = str.optional; // Zero or one
// Named capture
let parser = re.label;
// Ignore (don't include in AST)
let parser = str.ignore;
// Lookahead (don't consume)
let parser = str.lookahead; // Positive: must match
let parser = str.not_ahead; // Negative: must NOT match
Grammar Macro
For declarative grammar definition:
use grammar;
let grammar = grammar! ;
Capture Atoms
Capture atoms extract named values during parsing, similar to regex named groups. They work with all backends (Packrat, Bytecode, Streaming).
Basic Usage
use ;
let grammar = new
.rule
.build;
let mut arena = for_input;
let mut parser = packrat;
let result = parser.parse_from_pos?;
// Access captures
if let Some = result.get_capture
Capture API
// Get a single capture by name
let value = result.get_capture;
// Get all capture names
for name in result.capture_names
// Check if capture exists
if result.has_capture
Backend Compatibility
| Backend | Capture Support | Notes |
|---|---|---|
| Packrat | Full | Native support |
| Bytecode | Full | Uses capture instructions |
| Streaming | Full | Captures persist across chunks |
Scope Atoms
Scope atoms create isolated capture contexts. Captures made inside a scope are discarded when the scope exits, preventing pollution of the parent context.
Use Cases
- Nested parsing where inner captures shouldn't affect outer state
- Repetitive patterns where each iteration starts fresh
- Context isolation in recursive grammars
Basic Usage
use ;
let grammar = new
.rule
.build;
Dynamic Atoms
Dynamic atoms enable runtime-determined parsing via registered callbacks. This allows context-sensitive parsing where the grammar itself depends on input or previously captured values.
Registering a Callback
use ;
;
let callback_id = register_dynamic_callback;
Using Dynamic Atoms in Grammars
let grammar = new
.rule
.build;
Backend Compatibility
| Backend | Dynamic Support | Notes |
|---|---|---|
| Packrat | Full | Native support (recommended) |
| Bytecode | Fallback | Uses Packrat internally |
| Streaming | Fallback | Uses Packrat internally |
Note: For heavy dynamic atom usage, prefer the Packrat backend for best performance.
Streaming with Captures
The streaming parser supports captures while maintaining bounded memory usage. Captures persist across streaming parse operations.
Basic Usage
use ;
use Cursor;
let grammar = new
.rule
.build;
let config = ChunkConfig ;
let mut parser = new;
let mut arena = for_input;
let mut cursor = new;
let result = parser.parse_from_reader?;
if let Some = &result.capture_state
Chunk Configuration
| Preset | Chunk Size | Window | Use Case |
|---|---|---|---|
small() |
16 KB | 2 | Real-time feeds |
medium() |
64 KB | 3 | Default |
large() |
256 KB | 4 | Log files |
huge() |
1 MB | 5 | Large files |
Performance Notes
- Memory: O(chunk_size × window_size + capture_state)
- Captures accumulate during parse, available at end
- For very large captures, use
reset()to process incrementally
Transform System
The transform system converts generic parse trees into typed Rust data structures, similar to Parslet’s transformation system.
Value Types
The Value enum represents transformed data:
Basic Transformations
use ;
let transform = new
// Transform "int" captures by doubling
.rule;
let value = hash;
let result = transform.apply?;
assert_eq!;
Pattern Matching
Pattern-based transformations similar to Parslet:
use ;
let transform = new
// Match hash with specific fields
.pattern;
Pattern Types
| Pattern | Description | Example |
|---|---|---|
Pattern::simple("x") |
Match any leaf value and bind to variable | Pattern::simple("n") matches 42 |
Pattern::str("value") |
Match a specific string value | Pattern::str("+") matches "+" |
Pattern::int(n) |
Match a specific integer | Pattern::int(42) matches 42 |
Pattern::sequence("x") |
Match an array and bind to variable | Pattern::sequence("items") |
Pattern::subtree("x") |
Match anything and bind to variable | Pattern::subtree("node") |
Pattern::hash() |
Match a hash with specific fields | See example above |
Converting AST to Value
use ;
// After parsing
let ast = parser.parse?;
let value = ast_to_value;
// Now apply transforms
let result = transform.apply?;
Derive Macros
The FromAst derive macro automatically generates code to convert Value
types into typed Rust structs and enums. This eliminates boilerplate code
for AST transformation.
Basic Usage
use FromAst;
use Value;
// Convert Value to typed Expr
let value: Value = /* ... parsed value ... */;
let expr: Expr = value.try_into?;
Container Attributes
| Attribute | Description |
|---|---|
#[parsanol(rule = "name")] |
Specify the grammar rule name |
Variant Attributes (for enums)
| Attribute | Description |
|---|---|
#[parsanol(tag = "literal")] |
Match by literal tag string |
#[parsanol(tag_expr = expr)] |
Match by expression (for dynamic tags) |
Field Attributes
| Attribute | Description |
|---|---|
#[parsanol(field = "name")] |
Map to different hash field name |
#[parsanol(default)] |
Use Default::default() if missing |
#[parsanol(default = expr)] |
Use expression if missing |
Complete Example
use FromAst;
use Value;
// Usage
Single-Field Tuple Structs
Single-field tuple structs automatically get transparent conversion:
;
// Value::String("foo") directly converts to Identifier("foo")
Error Handling
use FromAstError;
match value.try_into
Streaming Builder
The streaming builder API allows single-pass parsing without intermediate AST construction. This is ideal for:
-
Maximum performance (eliminates AST allocation)
-
Custom output formats
-
Memory-constrained environments
Implementing StreamingBuilder
use ;
// Custom builder that collects all strings
Using parse_with_builder
use ;
let grammar = /* ... */;
let input = "hello world";
let mut arena = for_input;
let mut parser = new;
// Create builder
let mut builder = StringCollector ;
// Parse with streaming builder
let result = parser.parse_with_builder?;
// result: Vec<String>
Built-in Builders
Several useful builders are provided:
| Builder | Description |
|---|---|
DebugBuilder |
Collects all events as strings for debugging |
BuilderStringCollector |
Collects all string values |
BuilderNodeCounter |
Counts nodes by type |
Ruby Integration
The streaming builder works with Ruby callbacks via FFI:
include Parsanol::BuilderCallbacks
@strings = []
end
@strings << value
end
@strings
end
end
builder = MyBuilder.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
Parallel Parsing
Parse multiple inputs in parallel using rayon for linear speedup on multi-core systems.
Enabling Parallel Feature
[]
= { = "0.1", = ["parallel"] }
Batch Parallel Parsing
use ;
let grammar = /* ... */;
let inputs = vec!;
// Parse all inputs in parallel
let results = parse_batch_parallel;
// Results are in same order as inputs
for in results.iter.enumerate
Parallel Configuration
use ;
let config = new
.with_num_threads // Use 4 threads
.with_min_chunk_size; // Minimum inputs per thread
let results = parse_batch_parallel;
Performance
| Scenario | Speedup |
|---|---|
| 8 cores, 100 files | ~8x faster than sequential |
| 4 cores, 50 files | ~4x faster than sequential |
| Single core | Same as sequential (graceful fallback) |
When the parallel feature is not enabled, the functions fall back to
sequential parsing automatically.
Infix Expression Parsing
Built-in support for parsing infix expressions with operator precedence and associativity.
Using InfixBuilder
use ;
let mut builder = new;
let expr_idx = new
.primary // Base expression (numbers, parens)
.op // Higher precedence
.op
.op // Lower precedence
.op
.op // Right-associative
.build;
Associativity
| Associativity | Meaning | Example |
|---|---|---|
Assoc::Left |
Left-to-right evaluation | a - b - c = (a - b) - c |
Assoc::Right |
Right-to-left evaluation | a = b = c = a = (b = c) |
Assoc::NonAssoc |
Cannot chain | a < b < c is an error |
Rich Error Reporting
Tree-structured error messages similar to Parslet for better debugging.
Basic Usage
use ;
// Create rich errors
let error = new
.at // offset, line, column
.context
.child
.build;
// Print as ASCII tree
println!;
Example Output
Error at line 3, column 5:
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
Source Context
// Format error with source code context
let formatted = error.format_with_source;
println!;
Output:
Error at line 3, column 5:
let x = foo bar
^
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
Source Location Tracking
Track source positions through the parsing and transformation pipeline.
Using SourceSpan
use ;
use ;
// Create a span from offsets
let span = from_offsets;
println!;
// Merge adjacent spans
let merged = span1.merge;
// Check overlap
if span1.overlaps
// Transform AST with source spans preserved
let = ast_to_value_with_span;
Grammar Composition
Build complex grammars by importing and composing smaller grammars.
Importing Grammars
use *;
let mut builder = new;
// Import another grammar with a prefix
builder.import;
builder.import;
// Reference imported rules
let combined = seq;
builder.rule;
let grammar = builder.build;
Ruby FFI
Parsanol-rs can be compiled as a Ruby extension for use with parsanol-ruby.
Features
The Ruby FFI provides:
- 26x faster parsing than pure Ruby (Parslet)
- Single
parse()API - no confusing options - Lazy line/column - zero overhead unless needed
- Streaming Builder - single-pass parsing with callbacks
Building for Ruby
# Build with Ruby support
# The resulting library can be loaded as a Ruby extension
Ruby API
# Serialize grammar once
grammar = str().as(:greeting) >> str().maybe >> match().repeat(1).as(:name)
grammar_json = Parsanol::Native.serialize_grammar(grammar)
# Parse - simple and clean
result = Parsanol::Native.parse(grammar_json, )
# => {greeting: "hello"@0, name: "world"@6}
# Line/column available when needed (computed lazily)
result[:greeting].line_and_column # => [1, 1]
result[:name].line_and_column # => [1, 7]
Lazy Line/Column
Slice objects support lazy line/column computation:
slice.offset- character position (always available, zero cost)slice.content- string value (always available, zero cost)slice.line_and_column- [line, column] tuple (computed lazily, cached)
This provides zero overhead for users who don't need position info, while keeping line/column always available when needed.
Streaming Builder (Ruby)
For maximum performance, use the streaming builder API:
# Define a builder class
include Parsanol::BuilderCallbacks
@strings = []
end
@strings << value
end
@strings << value.to_s
end
@strings
end
end
# Parse with streaming builder
builder = StringCollector.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
# result: ["42", "+", "8"]
See parsanol-ruby for full documentation.
WASM Support
Parsanol-rs can be compiled to WebAssembly for use in browsers or Node.js.
Building for WASM
# Install wasm-pack
# Build for web
JavaScript API
import from 'parsanol';
const grammar = ;
const parser = ;
const result = parser.;
Debug Tools
Parser Tracing
Enable tracing for debugging:
let = parser.parse_with_trace;
// Print trace
println!;
Grammar Visualization
use GrammarVisualizer;
let viz = new;
// Generate Mermaid diagram
println!;
// Generate GraphViz DOT
println!;
Performance
Parsanol-rs is designed for high performance:
-
18-44x Faster than pure Ruby parsers (Parslet)
-
99.5% Fewer Allocations through arena allocation
-
O(n) Parsing via packrat memoization
-
SIMD Optimization: Fast character matching via memchr
-
AHash: Fast hashing for cache lookups
-
SmallVec: Stack-allocated small collections
Benchmarks
| Parser | Input Size | Time |
|---|---|---|
| parsanol-rs (Ruby Transform) | 1KB JSON | ~50µs |
| parsanol-rs (Serialized) | 1KB JSON | ~30µs |
| parsanol-rs (Native) | 1KB JSON | ~20µs |
| Pure Ruby (Parslet) | 1KB JSON | ~800µs |
Security
Parsanol-rs includes built-in protection against denial-of-service attacks.
Default Limits
| Limit | Default Value | Description |
|---|---|---|
max_input_size |
100 MB | Maximum input size in bytes |
max_recursion_depth |
1000 | Maximum recursion depth for nested structures |
Custom Limits
For untrusted input, configure custom limits:
use ;
// For untrusted input, use stricter limits
let mut parser = with_limits;
match parser.parse
Best Practices
- Always limit input size when parsing untrusted data
- Use external timeouts for network services (e.g.,
tokio::time::timeout) - Monitor memory usage in production environments
See SECURITY.md for complete security documentation.
Module Reference
Core Modules
| Module | Description |
|---|---|
portable::parser |
PEG parsing engine with packrat memoization |
portable::grammar |
Grammar representation and serialization |
portable::ast |
AST node types |
portable::arena |
Arena allocator for AST nodes |
portable::cache |
Dense cache for memoization |
portable::parser_dsl |
Fluent API for grammar definition |
portable::transform |
Transform system for converting parse trees |
portable::error |
Rich error reporting |
portable::infix |
Infix expression parsing with precedence |
portable::debug |
Debugging and visualization tools |
portable::source_location |
Source span tracking with line/column info |
portable::streaming |
Streaming parser support for large inputs |
portable::streaming_builder |
Single-pass parsing with custom builders |
portable::parallel |
Parallel parsing for batch processing |
portable::incremental |
Incremental parsing for editor integration |
portable::visitor |
AST visitor pattern implementation |
portable::source_map |
Source map generation for debugging |
Examples
See the examples/ directory for 39 complete examples demonstrating
real-world parsing scenarios:
Expression Parsers
| Example | Description |
|---|---|
calculator-pattern |
Parse expressions with pattern-based transforms |
calculator-transform |
Parse and evaluate expressions with native transforms |
boolean-algebra |
Parse boolean expressions with AND, OR, NOT operators |
expression-evaluator |
Evaluate expressions with variables and function calls |
prec-calc |
Precedence climbing algorithm for infix expressions |
Data Formats
| Example | Description |
|---|---|
json-pattern |
JSON parser with pattern matching |
json-transform |
JSON parser with native transforms |
csv-pattern |
CSV parser handling quoted fields (pattern mode) |
csv-transform |
CSV parser handling quoted fields (transform mode) |
ini |
INI configuration file parser |
simple-xml |
XML parser with tag matching |
markup |
Lightweight markup language parser |
toml |
TOML configuration file parser |
yaml |
YAML subset parser |
markdown |
Markdown subset parser with headers and lists |
iso-8601 |
ISO 8601 date/time/duration parser |
iso-6709 |
ISO 6709 geographic coordinate parser |
URLs & Network
| Example | Description |
|---|---|
url |
URL parser with scheme, host, path components |
email |
Email address parser with validation |
ip-address |
IPv4/IPv6 address parser with validation |
Code & Templates
| Example | Description |
|---|---|
erb |
ERB template parser for Ruby templates |
sexp |
S-expression parser for Lisp-style syntax |
minilisp |
MiniLisp parser demonstrating recursive grammars |
Text Processing
| Example | Description |
|---|---|
balanced-parens |
Balanced parentheses parser |
string-literal |
String literal parser with escape sequences |
sentence |
Sentence parser with Unicode support |
comments |
Comment parser (line and block comments) |
Error Handling
| Example | Description |
|---|---|
error-reporting |
Rich error reporting with tree structure |
error-recovery |
Error recovery strategies |
deepest-errors |
Deepest error point tracking |
nested-errors |
Nested error tree visualization |
Advanced Features
| Example | Description |
|---|---|
streaming |
Streaming parser for large inputs |
incremental |
Incremental parsing for editor integration |
linter |
Code linter with custom validation |
custom-atoms |
Custom atom registration |
modularity |
Grammar composition from modules |
Run examples with:
Full documentation and interactive examples available at the website.
API Stability
The API is currently in active development. Version 0.x indicates that breaking changes may occur.
Stable APIs:
-
GrammarandGrammarBuilder -
PortableParserbasic parsing -
AstArenaandAstNode -
Parser DSL combinators
-
Streaming builder trait and built-in builders
-
Parallel parsing functions
Experimental APIs (may change):
-
Transformand pattern matching -
Rich error reporting
-
Infix expression parsing
-
Debug/trace tools
Documentation
Architecture
See docs/ARCHITECTURE.md for the overall system architecture.
Development
- docs/refactoring-plan.md - Current refactoring roadmap
- docs/continuation-prompt.md - Prompt for continuing work
- docs/MIGRATION.md - Migration guide from Parslet
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests at GitHub.
Development Setup
# Clone the repository
# Build (workspace)
# Run tests (234 unit tests)
# Run all examples
# Run benchmarks
# Check code quality
Testing
The test suite consists of multiple types of tests:
Unit tests: 234 tests covering internal functionality of each module (parser, arena, cache, transform, derive, etc.).
Integration tests: Located in tests/ directory, test end-to-end parsing scenarios.
Examples: 39 runnable parsers in examples/ directory demonstrating real-world usage.
Examples are compiled and tested via cargo build --examples.
Documentation tests (doc tests): Code examples in documentation comments. Note that many doc tests are marked with ignore because they show incomplete code snippets (e.g., method signatures or pseudocode) rather than complete runnable examples. This is intentional - the doc tests illustrate API patterns, while the examples/ directory contains fully runnable code that is verified by CI.
To run all tests:
# Unit + integration tests
# Include doc tests (most will be ignored as designed)
# Test examples compile
# Run ignored doc tests (will fail if not complete)
Release Process
This project uses release-plz for automated releases.
How It Works
-
Push to main → release-plz creates/updates a Release PR
-
Review and merge the Release PR → Version is updated in main
-
After merge → release-plz automatically:
- Creates a git tag (e.g.,
v0.1.2) - Publishes to crates.io
- Creates a GitHub release
- Creates a git tag (e.g.,
-
Build artifacts → CI builds native libraries and uploads them to the GitHub release
Maintainer Workflow
Normal Release (Recommended)
Just push commits with conventional commit messages:
release-plz will:
- Create a Release PR with version bump (e.g.,
0.1.1→0.1.2forfeat:) - Wait for you to review and merge
- Publish automatically after merge
Manual Release
If you need to trigger a release manually:
- Go to Actions → Release workflow
- Click Run workflow
- Select action:
auto(default): Let release-plz deciderelease-pr: Just create/update the Release PRrelease: Force a release immediately
Version Bump Rules
release-plz uses conventional commits:
| Commit Type | Version Bump |
|---|---|
feat: |
Minor (0.1.0 → 0.2.0) |
fix: |
Patch (0.1.0 → 0.1.1) |
feat!: or fix!: |
Major (0.1.0 → 1.0.0) |
docs:, chore:, etc. |
No bump (changelog only) |
What Gets Released
- crates.io:
parsanolcrate - GitHub Release: With release notes
- Build Artifacts: Native libraries for Linux, macOS, Windows (x64, ARM64)
Troubleshooting
"Already published" error:
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
No Release PR created:
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the
release-prjob
Publish failed:
- Check crates.io API token is valid
- Check version doesn't already exist on crates.io
FFI Feature Testing
This crate supports optional Ruby and WebAssembly (WASM) FFI features. These must be tested explicitly.
[!IMPORTANT] FFI features require additional setup and may not compile/link in all environments. Always verify FFI code compiles before pushing to CI.
Ruby FFI Testing
The Ruby FFI uses the magnus crate to provide Ruby bindings.
Prerequisites:
- Ruby 3.0+ installed
- Ruby development headers (macOS:
brew install ruby, Ubuntu:sudo apt-get install ruby-dev)
Testing:
# Compile-time check (no Ruby required for linking)
# Full integration tests (requires Ruby runtime)
# Note: These tests are marked #[ignore] - run manually
Test coverage:
tests/ruby_ffi.rs- Comprehensive tests for RubyBuilder, RubyObject trait- Magnus type annotations (e.g.,
funcall::<&str, (), Value>) - Error handling from Ruby callbacks
- Parse result conversion
WebAssembly FFI Testing
The WASM FFI uses wasm-bindgen for JavaScript bindings.
Prerequisites:
wasm-packinstalled (cargo install wasm-pack)
Testing:
# Compile-time check
# Full WASM build and test
Test coverage:
tests/wasm_ffi.rs- Tests for WASM exports, grammar serialization- JsValue conversions
- Error handling for WASM
CI Integration
CI automatically tests FFI features:
# From .github/workflows/ci.yml
strategy:
matrix:
feature:
The Ruby and WASM feature tests run on every push to catch FFI regressions early.
Release Process
This project uses release-plz for automated releases.
How It Works
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASE-PLZ WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────┘
Push to main
│
▼
┌─────────────────┐
│ release-pr job │ Creates/updates Release PR
└────────┬────────┘
│
▼
┌─────────────────┐
│ Release PR │ Contains version bump + changelog
│ (on GitHub) │
└────────┬────────┘
│
│ Maintainer reviews and merges
▼
┌─────────────────┐
│ release job │ Runs release-plz release
└────────┬────────┘
│
├──────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Create tag │ │ Publish to │
│ (v0.1.2) │ │ crates.io │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ GitHub Release │ With release notes
└────────┬────────┘
│
▼
┌─────────────────┐
│ Build jobs │ Build native libraries
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Release │ Upload artifacts
└─────────────────┘
Maintainer Workflow
Normal Release (Recommended)
Just push commits with conventional commit messages:
release-plz will:
- Create a Release PR with version bump (e.g.,
0.1.1→0.1.2forfeat:) - Wait for you to review and merge
- Publish automatically after merge
Manual Release
If you need to trigger a release manually:
- Go to Actions → Release workflow
- Click Run workflow
- Select action:
auto(default): Let release-plz deciderelease-pr: Just create/update the Release PRrelease: Force a release immediately
Version Bump Rules
release-plz uses conventional commits:
| Commit Type | Version Bump |
|---|---|
feat: |
Minor (0.1.0 → 0.2.0) |
fix: |
Patch (0.1.0 → 0.1.1) |
feat!: or fix!: |
Major (0.1.0 → 1.0.0) |
docs:, chore:, etc. |
No bump (changelog only) |
What Gets Released
- crates.io:
parsanolcrate - GitHub Release: With release notes
- Build Artifacts: Native libraries for Linux, macOS, Windows (x64, ARM64)
Troubleshooting
"Already published" error:
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
No Release PR created:
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the
release-prjob
Publish failed:
- Check that the
crates.ioenvironment is configured in repository settings - Check that trusted publishing is enabled
See Also
-
parsanol-ruby - Ruby bindings
-
Parslet - Original Ruby PEG parser (inspiration)