Parsanol-rs
A high-performance PEG (Parsing Expression Grammar) parser library for Rust with packrat memoization and arena allocation.
Purpose
Parsanol-rs is a generic, domain-agnostic PEG parser library written in Rust. It provides high-performance parsing capabilities with a focus on:
-
Speed: Packrat memoization for O(n) parsing complexity
-
Memory efficiency: Arena allocation for zero-copy AST construction
-
Developer experience: Fluent API for building grammars, rich error reporting
-
Flexibility: Transform system for converting parse trees to typed Rust structs via derive macros
Features
- Quick Start - Get started in minutes
- Parser DSL - Fluent API for grammar definition
- Transform System - Convert parse trees to typed structs
- Derive Macros - Automatic typed AST generation
- Streaming Builder - Single-pass parsing with custom output
- Parallel Parsing - Multi-file parsing with rayon
- Infix Expression Parsing - Built-in operator precedence
- Rich Error Reporting - Tree-structured error messages
- Source Location Tracking - Line/column tracking through transforms
- Grammar Composition - Import and compose grammars
- Ruby FFI - Optional Ruby bindings
- WASM Support - Optional WebAssembly bindings
Architecture
┌─────────────────────────────────────────────────────────────┐
│ PARSANOL-RS │
│ (Generic PEG Parser Library) │
├─────────────────────────────────────────────────────────────┤
│ • Parser combinators (PEG atoms) │
│ • Grammar representation │
│ • Packrat memoization │
│ • Arena allocation │
│ • Infix expression parsing │
│ • Rich error reporting (tree structure) │
│ • Transform DSL (pattern matching) │
│ • Derive macros for typed ASTs │
│ • Optional Ruby FFI / WASM bindings │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ (build ON TOP) │ (build ON TOP)
│ │
┌─────────┴──────────┐ ┌─────────┴─────────┐
│ parsanol-express │ │ Your Language │
│ (EXPRESS lexer) │ │ (Your DSL) │
└────────────────────┘ └───────────────────┘
[!IMPORTANT] Parsanol-rs is a GENERIC parser library. It has no knowledge of any specific domain (EXPRESS, Ruby, JSON, YAML, etc.). Domain-specific parsers should be built ON TOP of this library.
Workspace Structure
This repository uses a Cargo workspace with two crates:
parsanol-rs/
├── parsanol/ # Main parser library
│ ├── src/
│ └── Cargo.toml
├── parsanol-derive/ # Derive macros (always included)
│ ├── src/
│ └── Cargo.toml
├── examples/ # 37 example parsers
└── Cargo.toml # Workspace root
Installation
Add this to your Cargo.toml:
[]
= "0.1"
The parsanol-derive crate is automatically included as a dependency,
providing the #[derive(FromAst)] macro for typed AST conversion.
Optional Features
-
ruby- Enable Ruby FFI bindings (requiresmagnus,rb-sys) -
wasm- Enable WebAssembly bindings (requireswasm-bindgen,js-sys) -
parallel- Enable parallel parsing (requiresrayon)
[]
= { = "0.1", = ["ruby", "parallel"] }
Quick Start
Basic Parsing
use ;
// Build a simple grammar
let grammar = new
.rule
.build;
let input = "helloworld";
let mut arena = for_input;
let mut parser = new;
match parser.parse
Calculator with Operator Precedence
use ;
Parser DSL
Atom Types
| Atom | Description | Example |
|---|---|---|
str("literal") |
Match exact string | str("hello") |
re("pattern") |
Match regex pattern | re(r"[0-9]+") |
any() |
Match any single character | any() |
ref_("rule") |
Reference to named rule | ref_("expr") |
seq([…]) |
Sequence of atoms | seq(vec![a, b, c]) |
choice([…]) |
Alternative atoms | choice(vec![a, b]) |
cut() |
Commit to this branch (prevent backtracking) | cut() |
Combinators
All atoms implement the ParsletExt trait with these methods:
use *;
// Sequence: A >> B
let parser = str.then;
// Alternative: A | B
let parser = str.or;
// Repetition
let parser = str.repeat; // One or more
let parser = str.repeat; // Zero to three
let parser = str.many; // Zero or more
let parser = str.many1; // One or more
let parser = str.optional; // Zero or one
// Named capture
let parser = re.label;
// Ignore (don't include in AST)
let parser = str.ignore;
// Lookahead (don't consume)
let parser = str.lookahead; // Positive: must match
let parser = str.not_ahead; // Negative: must NOT match
Grammar Macro
For declarative grammar definition:
use grammar;
let grammar = grammar! ;
Transform System
The transform system converts generic parse trees into typed Rust data structures, similar to Parslet’s transformation system.
Value Types
The Value enum represents transformed data:
Basic Transformations
use ;
let transform = new
// Transform "int" captures by doubling
.rule;
let value = hash;
let result = transform.apply?;
assert_eq!;
Pattern Matching
Pattern-based transformations similar to Parslet:
use ;
let transform = new
// Match hash with specific fields
.pattern;
Pattern Types
| Pattern | Description | Example |
|---|---|---|
Pattern::simple("x") |
Match any leaf value and bind to variable | Pattern::simple("n") matches 42 |
Pattern::str("value") |
Match a specific string value | Pattern::str("+") matches "+" |
Pattern::int(n) |
Match a specific integer | Pattern::int(42) matches 42 |
Pattern::sequence("x") |
Match an array and bind to variable | Pattern::sequence("items") |
Pattern::subtree("x") |
Match anything and bind to variable | Pattern::subtree("node") |
Pattern::hash() |
Match a hash with specific fields | See example above |
Converting AST to Value
use ;
// After parsing
let ast = parser.parse?;
let value = ast_to_value;
// Now apply transforms
let result = transform.apply?;
Derive Macros
The FromAst derive macro automatically generates code to convert Value
types into typed Rust structs and enums. This eliminates boilerplate code
for AST transformation.
Basic Usage
use FromAst;
use Value;
// Convert Value to typed Expr
let value: Value = /* ... parsed value ... */;
let expr: Expr = value.try_into?;
Container Attributes
| Attribute | Description |
|---|---|
#[parsanol(rule = "name")] |
Specify the grammar rule name |
Variant Attributes (for enums)
| Attribute | Description |
|---|---|
#[parsanol(tag = "literal")] |
Match by literal tag string |
#[parsanol(tag_expr = expr)] |
Match by expression (for dynamic tags) |
Field Attributes
| Attribute | Description |
|---|---|
#[parsanol(field = "name")] |
Map to different hash field name |
#[parsanol(default)] |
Use Default::default() if missing |
#[parsanol(default = expr)] |
Use expression if missing |
Complete Example
use FromAst;
use Value;
// Usage
Single-Field Tuple Structs
Single-field tuple structs automatically get transparent conversion:
;
// Value::String("foo") directly converts to Identifier("foo")
Error Handling
use FromAstError;
match value.try_into
Streaming Builder
The streaming builder API allows single-pass parsing without intermediate AST construction. This is ideal for:
-
Maximum performance (eliminates AST allocation)
-
Custom output formats
-
Memory-constrained environments
Implementing StreamingBuilder
use ;
// Custom builder that collects all strings
Using parse_with_builder
use ;
let grammar = /* ... */;
let input = "hello world";
let mut arena = for_input;
let mut parser = new;
// Create builder
let mut builder = StringCollector ;
// Parse with streaming builder
let result = parser.parse_with_builder?;
// result: Vec<String>
Built-in Builders
Several useful builders are provided:
| Builder | Description |
|---|---|
DebugBuilder |
Collects all events as strings for debugging |
BuilderStringCollector |
Collects all string values |
BuilderNodeCounter |
Counts nodes by type |
Ruby Integration
The streaming builder works with Ruby callbacks via FFI:
include Parsanol::BuilderCallbacks
@strings = []
end
@strings << value
end
@strings
end
end
builder = MyBuilder.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
Parallel Parsing
Parse multiple inputs in parallel using rayon for linear speedup on multi-core systems.
Enabling Parallel Feature
[]
= { = "0.1", = ["parallel"] }
Batch Parallel Parsing
use ;
let grammar = /* ... */;
let inputs = vec!;
// Parse all inputs in parallel
let results = parse_batch_parallel;
// Results are in same order as inputs
for in results.iter.enumerate
Parallel Configuration
use ;
let config = new
.with_num_threads // Use 4 threads
.with_min_chunk_size; // Minimum inputs per thread
let results = parse_batch_parallel;
Performance
| Scenario | Speedup |
|---|---|
| 8 cores, 100 files | ~8x faster than sequential |
| 4 cores, 50 files | ~4x faster than sequential |
| Single core | Same as sequential (graceful fallback) |
When the parallel feature is not enabled, the functions fall back to
sequential parsing automatically.
Infix Expression Parsing
Built-in support for parsing infix expressions with operator precedence and associativity.
Using InfixBuilder
use ;
let mut builder = new;
let expr_idx = new
.primary // Base expression (numbers, parens)
.op // Higher precedence
.op
.op // Lower precedence
.op
.op // Right-associative
.build;
Associativity
| Associativity | Meaning | Example |
|---|---|---|
Assoc::Left |
Left-to-right evaluation | a - b - c = (a - b) - c |
Assoc::Right |
Right-to-left evaluation | a = b = c = a = (b = c) |
Assoc::NonAssoc |
Cannot chain | a < b < c is an error |
Rich Error Reporting
Tree-structured error messages similar to Parslet for better debugging.
Basic Usage
use ;
// Create rich errors
let error = new
.at // offset, line, column
.context
.child
.build;
// Print as ASCII tree
println!;
Example Output
Error at line 3, column 5:
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
Source Context
// Format error with source code context
let formatted = error.format_with_source;
println!;
Output:
Error at line 3, column 5:
let x = foo bar
^
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
Source Location Tracking
Track source positions through the parsing and transformation pipeline.
Using SourceSpan
use ;
use ;
// Create a span from offsets
let span = from_offsets;
println!;
// Merge adjacent spans
let merged = span1.merge;
// Check overlap
if span1.overlaps
// Transform AST with source spans preserved
let = ast_to_value_with_span;
Grammar Composition
Build complex grammars by importing and composing smaller grammars.
Importing Grammars
use *;
let mut builder = new;
// Import another grammar with a prefix
builder.import;
builder.import;
// Reference imported rules
let combined = seq;
builder.rule;
let grammar = builder.build;
Ruby FFI
Parsanol-rs can be compiled as a Ruby extension for use with parsanol-ruby.
Features
The Ruby FFI provides:
-
Fast parsing via Rust (18-44x faster than pure Ruby)
-
Three transformation options:
-
Ruby Transform: Parse in Rust, transform in Ruby (Parslet-compatible)
-
Serialized: Parse + transform in Rust, JSON output
-
Native: Direct Ruby object construction via FFI
-
-
Streaming Builder: Single-pass parsing with Ruby callbacks (most efficient)
Building for Ruby
# Build with Ruby support
# The resulting library can be loaded as a Ruby extension
Ruby API
rule(:number) { match().repeat(1).as(:int) }
rule(:operator) { str() | str() }
rule(:expr) { number.as(:left) >> operator.as(:op) >> number.as(:right) }
root(:expr)
end
parser = MyParser.new
tree = parser.parse()
Streaming Builder (Ruby)
For maximum performance, use the streaming builder API:
# Define a builder class
include Parsanol::BuilderCallbacks
@strings = []
end
@strings << value
end
@strings << value.to_s
end
@strings
end
end
# Parse with streaming builder
builder = StringCollector.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
# result: ["42", "+", "8"]
See parsanol-ruby for full documentation.
WASM Support
Parsanol-rs can be compiled to WebAssembly for use in browsers or Node.js.
Building for WASM
# Install wasm-pack
# Build for web
JavaScript API
import from 'parsanol';
const grammar = ;
const parser = ;
const result = parser.;
Debug Tools
Parser Tracing
Enable tracing for debugging:
let = parser.parse_with_trace;
// Print trace
println!;
Grammar Visualization
use GrammarVisualizer;
let viz = new;
// Generate Mermaid diagram
println!;
// Generate GraphViz DOT
println!;
Performance
Parsanol-rs is designed for high performance:
-
18-44x Faster than pure Ruby parsers (Parslet)
-
99.5% Fewer Allocations through arena allocation
-
O(n) Parsing via packrat memoization
-
SIMD Optimization: Fast character matching via memchr
-
AHash: Fast hashing for cache lookups
-
SmallVec: Stack-allocated small collections
Benchmarks
| Parser | Input Size | Time |
|---|---|---|
| parsanol-rs (Ruby Transform) | 1KB JSON | ~50µs |
| parsanol-rs (Serialized) | 1KB JSON | ~30µs |
| parsanol-rs (Native) | 1KB JSON | ~20µs |
| Pure Ruby (Parslet) | 1KB JSON | ~800µs |
Security
Parsanol-rs includes built-in protection against denial-of-service attacks.
Default Limits
| Limit | Default Value | Description |
|---|---|---|
max_input_size |
100 MB | Maximum input size in bytes |
max_recursion_depth |
1000 | Maximum recursion depth for nested structures |
Custom Limits
For untrusted input, configure custom limits:
use ;
// For untrusted input, use stricter limits
let mut parser = with_limits;
match parser.parse
Best Practices
- Always limit input size when parsing untrusted data
- Use external timeouts for network services (e.g.,
tokio::time::timeout) - Monitor memory usage in production environments
See SECURITY.md for complete security documentation.
Module Reference
Core Modules
| Module | Description |
|---|---|
portable::parser |
PEG parsing engine with packrat memoization |
portable::grammar |
Grammar representation and serialization |
portable::ast |
AST node types |
portable::arena |
Arena allocator for AST nodes |
portable::cache |
Dense cache for memoization |
portable::parser_dsl |
Fluent API for grammar definition |
portable::transform |
Transform system for converting parse trees |
portable::error |
Rich error reporting |
portable::infix |
Infix expression parsing with precedence |
portable::debug |
Debugging and visualization tools |
portable::source_location |
Source span tracking with line/column info |
portable::streaming |
Streaming parser support for large inputs |
portable::streaming_builder |
Single-pass parsing with custom builders |
portable::parallel |
Parallel parsing for batch processing |
portable::incremental |
Incremental parsing for editor integration |
portable::visitor |
AST visitor pattern implementation |
portable::source_map |
Source map generation for debugging |
Examples
See the examples/ directory for 37 complete examples demonstrating
real-world parsing scenarios:
Expression Parsers
| Example | Description |
|---|---|
calculator-pattern |
Parse expressions with pattern-based transforms |
calculator-transform |
Parse and evaluate expressions with native transforms |
boolean-algebra |
Parse boolean expressions with AND, OR, NOT operators |
expression-evaluator |
Evaluate expressions with variables and function calls |
prec-calc |
Precedence climbing algorithm for infix expressions |
Data Formats
| Example | Description |
|---|---|
json-pattern |
JSON parser with pattern matching |
json-transform |
JSON parser with native transforms |
csv-pattern |
CSV parser handling quoted fields (pattern mode) |
csv-transform |
CSV parser handling quoted fields (transform mode) |
ini |
INI configuration file parser |
simple-xml |
XML parser with tag matching |
markup |
Lightweight markup language parser |
toml |
TOML configuration file parser |
yaml |
YAML subset parser |
markdown |
Markdown subset parser with headers and lists |
iso-8601 |
ISO 8601 date/time/duration parser |
iso-6709 |
ISO 6709 geographic coordinate parser |
URLs & Network
| Example | Description |
|---|---|
url |
URL parser with scheme, host, path components |
email |
Email address parser with validation |
ip-address |
IPv4/IPv6 address parser with validation |
Code & Templates
| Example | Description |
|---|---|
erb |
ERB template parser for Ruby templates |
sexp |
S-expression parser for Lisp-style syntax |
minilisp |
MiniLisp parser demonstrating recursive grammars |
Text Processing
| Example | Description |
|---|---|
balanced-parens |
Balanced parentheses parser |
string-literal |
String literal parser with escape sequences |
sentence |
Sentence parser with Unicode support |
comments |
Comment parser (line and block comments) |
Error Handling
| Example | Description |
|---|---|
error-reporting |
Rich error reporting with tree structure |
error-recovery |
Error recovery strategies |
deepest-errors |
Deepest error point tracking |
nested-errors |
Nested error tree visualization |
Advanced Features
| Example | Description |
|---|---|
streaming |
Streaming parser for large inputs |
incremental |
Incremental parsing for editor integration |
linter |
Code linter with custom validation |
custom-atoms |
Custom atom registration |
modularity |
Grammar composition from modules |
Run examples with:
Full documentation and interactive examples available at the website.
API Stability
The API is currently in active development. Version 0.x indicates that breaking changes may occur.
Stable APIs:
-
GrammarandGrammarBuilder -
PortableParserbasic parsing -
AstArenaandAstNode -
Parser DSL combinators
-
Streaming builder trait and built-in builders
-
Parallel parsing functions
Experimental APIs (may change):
-
Transformand pattern matching -
Rich error reporting
-
Infix expression parsing
-
Debug/trace tools
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests at GitHub.
Development Setup
# Clone the repository
# Build (workspace)
# Run tests (234 unit tests)
# Run all examples
# Run benchmarks
# Check code quality
Testing
The test suite consists of multiple types of tests:
Unit tests: 234 tests covering internal functionality of each module (parser, arena, cache, transform, derive, etc.).
Integration tests: Located in tests/ directory, test end-to-end parsing scenarios.
Examples: 37 runnable parsers in examples/ directory demonstrating real-world usage.
Examples are compiled and tested via cargo build --examples.
Documentation tests (doc tests): Code examples in documentation comments. Note that many doc tests are marked with ignore because they show incomplete code snippets (e.g., method signatures or pseudocode) rather than complete runnable examples. This is intentional - the doc tests illustrate API patterns, while the examples/ directory contains fully runnable code that is verified by CI.
To run all tests:
# Unit + integration tests
# Include doc tests (most will be ignored as designed)
# Test examples compile
# Run ignored doc tests (will fail if not complete)
Release Process
This project uses release-plz for automated releases.
How It Works
-
Push to main → release-plz creates/updates a Release PR
-
Review and merge the Release PR → Version is updated in main
-
After merge → release-plz automatically:
- Creates a git tag (e.g.,
v0.1.2) - Publishes to crates.io
- Creates a GitHub release
- Creates a git tag (e.g.,
-
Build artifacts → CI builds native libraries and uploads them to the GitHub release
Maintainer Workflow
Normal Release (Recommended)
Just push commits with conventional commit messages:
release-plz will:
- Create a Release PR with version bump (e.g.,
0.1.1→0.1.2forfeat:) - Wait for you to review and merge
- Publish automatically after merge
Manual Release
If you need to trigger a release manually:
- Go to Actions → Release workflow
- Click Run workflow
- Select action:
auto(default): Let release-plz deciderelease-pr: Just create/update the Release PRrelease: Force a release immediately
Version Bump Rules
release-plz uses conventional commits:
| Commit Type | Version Bump |
|---|---|
feat: |
Minor (0.1.0 → 0.2.0) |
fix: |
Patch (0.1.0 → 0.1.1) |
feat!: or fix!: |
Major (0.1.0 → 1.0.0) |
docs:, chore:, etc. |
No bump (changelog only) |
What Gets Released
- crates.io:
parsanolcrate - GitHub Release: With release notes
- Build Artifacts: Native libraries for Linux, macOS, Windows (x64, ARM64)
Troubleshooting
"Already published" error:
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
No Release PR created:
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the
release-prjob
Publish failed:
- Check crates.io API token is valid
- Check version doesn't already exist on crates.io
FFI Feature Testing
This crate supports optional Ruby and WebAssembly (WASM) FFI features. These must be tested explicitly.
[!IMPORTANT] FFI features require additional setup and may not compile/link in all environments. Always verify FFI code compiles before pushing to CI.
Ruby FFI Testing
The Ruby FFI uses the magnus crate to provide Ruby bindings.
Prerequisites:
- Ruby 3.0+ installed
- Ruby development headers (macOS:
brew install ruby, Ubuntu:sudo apt-get install ruby-dev)
Testing:
# Compile-time check (no Ruby required for linking)
# Full integration tests (requires Ruby runtime)
# Note: These tests are marked #[ignore] - run manually
Test coverage:
tests/ruby_ffi.rs- Comprehensive tests for RubyBuilder, RubyObject trait- Magnus type annotations (e.g.,
funcall::<&str, (), Value>) - Error handling from Ruby callbacks
- Parse result conversion
WebAssembly FFI Testing
The WASM FFI uses wasm-bindgen for JavaScript bindings.
Prerequisites:
wasm-packinstalled (cargo install wasm-pack)
Testing:
# Compile-time check
# Full WASM build and test
Test coverage:
tests/wasm_ffi.rs- Tests for WASM exports, grammar serialization- JsValue conversions
- Error handling for WASM
CI Integration
CI automatically tests FFI features:
# From .github/workflows/ci.yml
strategy:
matrix:
feature:
The Ruby and WASM feature tests run on every push to catch FFI regressions early.
Release Process
This project uses release-plz for automated releases.
How It Works
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASE-PLZ WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────┘
Push to main
│
▼
┌─────────────────┐
│ release-pr job │ Creates/updates Release PR
└────────┬────────┘
│
▼
┌─────────────────┐
│ Release PR │ Contains version bump + changelog
│ (on GitHub) │
└────────┬────────┘
│
│ Maintainer reviews and merges
▼
┌─────────────────┐
│ release job │ Runs release-plz release
└────────┬────────┘
│
├──────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Create tag │ │ Publish to │
│ (v0.1.2) │ │ crates.io │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ GitHub Release │ With release notes
└────────┬────────┘
│
▼
┌─────────────────┐
│ Build jobs │ Build native libraries
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Release │ Upload artifacts
└─────────────────┘
Maintainer Workflow
Normal Release (Recommended)
Just push commits with conventional commit messages:
release-plz will:
- Create a Release PR with version bump (e.g.,
0.1.1→0.1.2forfeat:) - Wait for you to review and merge
- Publish automatically after merge
Manual Release
If you need to trigger a release manually:
- Go to Actions → Release workflow
- Click Run workflow
- Select action:
auto(default): Let release-plz deciderelease-pr: Just create/update the Release PRrelease: Force a release immediately
Version Bump Rules
release-plz uses conventional commits:
| Commit Type | Version Bump |
|---|---|
feat: |
Minor (0.1.0 → 0.2.0) |
fix: |
Patch (0.1.0 → 0.1.1) |
feat!: or fix!: |
Major (0.1.0 → 1.0.0) |
docs:, chore:, etc. |
No bump (changelog only) |
What Gets Released
- crates.io:
parsanolcrate - GitHub Release: With release notes
- Build Artifacts: Native libraries for Linux, macOS, Windows (x64, ARM64)
Troubleshooting
"Already published" error:
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
No Release PR created:
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the
release-prjob
Publish failed:
- Check that the
crates.ioenvironment is configured in repository settings - Check that trusted publishing is enabled
See Also
-
parsanol-ruby - Ruby bindings
-
Parslet - Original Ruby PEG parser (inspiration)