# Parsanol-rs
A high-performance PEG (Parsing Expression Grammar) parser library for Rust with packrat memoization and arena allocation.
[](https://crates.io/crates/parsanol)
[](https://docs.rs/parsanol)
[](https://github.com/parsanol/parsanol-rs/blob/main/LICENSE)
[](https://github.com/parsanol/parsanol-rs/actions/workflows/ci.yml)
## Purpose
Parsanol-rs is a generic, domain-agnostic PEG parser library written in
Rust. It provides high-performance parsing capabilities with a focus on:
- **Speed**: Packrat memoization for O(n) parsing complexity
- **Memory efficiency**: Arena allocation for zero-copy AST construction
- **Developer experience**: Fluent API for building grammars, rich error
reporting
- **Flexibility**: Transform system for converting parse trees to typed
Rust structs via derive macros
## Features
- [Quick Start](#quick-start) - Get started in minutes
- [Backend Abstraction](#backend-abstraction) - Extensible backend trait system
- [Bytecode Backend](#bytecode-backend) - Optional VM backend for linear patterns
- [Parser DSL](#parser-dsl) - Fluent API for grammar definition
- [Capture Atoms](#capture-atoms) - Extract named values during parsing
- [Scope Atoms](#scope-atoms) - Isolated capture contexts
- [Dynamic Atoms](#dynamic-atoms) - Runtime-determined parsing via callbacks
- [Streaming with Captures](#streaming-with-captures) - Memory-efficient parsing with capture support
- [Transform System](#transform-system) - Convert parse trees to typed structs
- [Derive Macros](#derive-macros) - Automatic typed AST generation
- [Streaming Builder](#streaming-builder) - Single-pass parsing with custom output
- [Parallel Parsing](#parallel-parsing) - Multi-file parsing with rayon
- [Infix Expression Parsing](#infix-expression-parsing) - Built-in operator precedence
- [Rich Error Reporting](#rich-error-reporting) - Tree-structured error messages
- [Source Location Tracking](#source-location-tracking) - Line/column tracking through transforms
- [Grammar Composition](#grammar-composition) - Import and compose grammars
- [Ruby FFI](#ruby-ffi) - Optional Ruby bindings
- [WASM Support](#wasm-support) - Optional WebAssembly bindings
# Bytecode Backend
Parsanol-rs supports two parsing backends:
1. **Packrat (default)**: Memoization-based parser with O(n) time complexity for all grammars
2. **Bytecode VM**: Stack-based virtual machine with optimization passes
## Backend Comparison
Both backends produce **identical parsing results** for all valid inputs. The difference lies in performance characteristics:
| **Time Complexity** | Guaranteed O(n) | O(n) to O(2^n) depending on grammar |
| **Memory Usage** | Higher (memoization table) | Lower (stack-based) |
| **Compilation** | None required | Pre-compilation needed |
| **Nested Repetitions** | Handles efficiently | Can be exponential |
| **Simple Patterns** | Good | Excellent |
| **Predictability** | Consistent performance | Varies by grammar |
### Performance Characteristics
**Packrat Backend:**
- Memoization stores parse results at each position
- Guarantees O(n) time complexity regardless of grammar structure
- Memory overhead scales with input size and grammar complexity
- Ideal when predictable performance is required
**Bytecode VM Backend:**
- Stack-based execution with backtracking
- O(n) for linear patterns (most common case)
- Can exhibit O(2^n) behavior for pathological patterns like `(a*)*`
- Lower memory footprint, good for memory-constrained environments
- Pre-compilation enables optimization passes
### Decision Matrix
| JSON, XML, config files | Either | Linear patterns, both perform well |
| Programming languages | Packrat | Complex grammar with nested structures |
| Log parsing | Bytecode | Simple patterns, streaming potential |
| Nested repetitions `(a*)*` | Packrat | Avoids exponential backtracking |
| Memory-constrained | Bytecode | Lower memory footprint |
| Need predictable O(n) | Packrat | Guaranteed linear time |
### Automatic Selection
Use `Backend::Auto` (the default) to let parsanol analyze your grammar:
```rust
// Automatic selection (default)
let mut parser = Parser::auto(grammar);
// Or explicitly:
let mut parser = Parser::new(grammar, Backend::Auto);
// Check the analysis
let analysis = parser.analysis();
println!("Has nested repetitions: {}", analysis.has_nested_repetition);
println!("Recommended: {:?}", analysis.recommended_backend());
```
### Why Nested Repetitions Are the Criterion
The backend selection is based on a **single hard rule**:
- **Has nested repetitions** (e.g., `(a*)*`) → **Packrat**
- **Otherwise** → **Bytecode**
This is the only criterion because nested repetitions are the **only pattern that causes exponential time complexity** in the bytecode backend. Here's why:
**The Algorithmic Problem:**
When a repetition contains another repetition, the parser must try all possible ways to divide the input. For pattern `(a*)*` on input "aaa":
```
Division 1: (aaa) - outer * matches 1 group
Division 2: (aa)(a) - outer * matches 2 groups
Division 3: (a)(aa) - outer * matches 2 groups (different split)
Division 4: (a)(a)(a) - outer * matches 3 groups
... and so on
```
The number of ways to partition n characters is O(2^n). The bytecode VM tries each possibility via backtracking, leading to exponential time.
**Why Packrat Solves It:**
Packrat memoizes results by (position, rule). Once `(a*)` is evaluated at position i, the result is cached. Subsequent evaluations at the same position are O(1) cache hits. This guarantees O(n) total time.
**Why Other Patterns Don't Matter:**
| Overlapping choices (`"a" \| "aa"`) | Linear backtracking | Both handle identically |
| Deep nesting | Stack depth increases | Both handle fine |
| Many alternatives | More choice points | Linear in alternative count |
| Left recursion | Infinite loop | **Both fail** - not a backend issue |
### How the Analysis Works
The grammar analysis is deliberately simple:
```rust
pub struct GrammarAnalysis {
/// Total atoms in the grammar
pub atom_count: usize,
/// Whether any Repetition contains another Repetition
pub has_nested_repetition: bool,
}
```
The algorithm iterates through all atoms and checks: "Is this a Repetition whose inner atom is also a Repetition?"
```rust
for atom in &grammar.atoms {
if let Atom::Repetition { atom: inner_idx, .. } = atom {
if let Some(inner) = grammar.get_atom(*inner_idx) {
if matches!(inner, Atom::Repetition { .. }) {
has_nested_repetition = true;
break;
}
}
}
}
```
This is O(atoms) time and detects the only pattern that matters for backend selection.
### When to Override Auto-Selection
The auto-selection only considers **time complexity**. You may want to manually select based on:
| **Memory-constrained** (embedded, WASM) | `Backend::Bytecode` | Lower memory: O(depth) vs O(n×rules) |
| **Very large files** (>100MB) | `Backend::Bytecode` | Packrat table grows with input size |
| **Predictable latency required** | `Backend::Packrat` | Guaranteed O(n), no pathological cases |
| **Streaming parsing** | `Backend::Bytecode` | Packrat requires full input in memory |
| **Incremental re-parsing** | `Backend::Packrat` | Memo table can be reused for unchanged portions |
| **Grammar has nested repetitions but input is bounded** | Either | If input is always small, exponential doesn't matter |
| **Testing/debugging** | `Backend::Packrat` | Consistent behavior across all inputs |
```rust
// Memory-constrained environment
let mut parser = Parser::bytecode(grammar);
// Safety-critical with guaranteed O(n)
let mut parser = Parser::packrat(grammar);
// Explicit choice regardless of analysis
let mut parser = Parser::new(grammar, Backend::Packrat);
```
### Problematic Grammar Patterns
The following patterns can cause exponential O(2^n) behavior in the Bytecode backend.
They are **safe with Packrat** due to memoization. If your grammar contains these,
use Packrat explicitly or rely on `Backend::Auto`.
**Critical Pattern: Nested Repetitions**
```
(a*)* // CRITICAL: Outer * tries O(2^n) ways to divide input
(a+)+ // Same issue
// Safe alternatives:
a* // Single repetition - O(n)
(a b)* // Fixed sequence inside - O(n)
```
**Moderate Pattern: Overlapping Choice Prefixes**
```
// Problematic: All start with 'a'
// Better: Distinct first characters
("a" | "b" | "c")+
```
**Safe Pattern: Deep Recursion (Both handle well)**
```
```
### Analyzing Your Grammar
Use the GrammarAnalysis API to check for nested repetitions:
```rust
use parsanol::portable::{
parser_dsl::{str, re, GrammarBuilder},
bytecode::{Backend, GrammarAnalysis, Parser},
};
fn main() {
let grammar = GrammarBuilder::new()
.rule("expr", re(r"[0-9]+"))
.build();
// Analyze the grammar
let analysis = GrammarAnalysis::analyze(&grammar);
// The only field that matters for backend selection
if analysis.has_nested_repetition {
println!("⚠️ Nested repetitions detected - use Packrat!");
} else {
println!("✅ No nested repetitions - Bytecode is efficient");
}
// Get recommendation (hard rule: nested repetition → Packrat, else → Bytecode)
println!("Recommended: {:?}", analysis.recommended_backend());
}
```
**GrammarAnalysis Fields:**
| `atom_count` | `usize` | Number of atoms in grammar (informational) |
| `has_nested_repetition` | `bool` | **The criterion** - if true, use Packrat |
**The `recommended_backend()` Method:**
Returns `Backend::Packrat` if `has_nested_repetition` is true, otherwise `Backend::Bytecode`. This is what `Backend::Auto` uses internally.
## Using the Bytecode Backend
```rust
use parsanol::portable::{
parser_dsl::{str, re, GrammarBuilder},
bytecode::{Backend, Parser},
};
let grammar = GrammarBuilder::new()
.rule("number", re(r"[0-9]+"))
.build();
// Create parser with bytecode backend
let mut parser = Parser::new(grammar, Backend::Bytecode);
let result = parser.parse("42");
// Or use auto-selection (analyzes grammar complexity)
let mut parser = Parser::auto(grammar);
let result = parser.parse("42");
```
### Known Differences
Both backends produce **identical results** for the vast majority of patterns. However, there are edge cases where behavior differs:
**Alternatives in Sequences**: For patterns like `("a" | "aa") "b"` on input `"aab"`:
- **Packrat**: May succeed due to memoization re-evaluation
- **Bytecode**: Fails (standard PEG semantics - once "a" succeeds, "aa" is not tried)
This difference only affects patterns with:
- Alternatives containing overlapping prefixes ("a" vs "aa")
- The alternative is followed by content that fails
- The later alternative would allow the following content to succeed
For most practical grammars, this difference never manifests. Use `Backend::Auto` to let parsanol choose the appropriate backend.
## Backend Abstraction
Parsanol provides a trait-based backend abstraction for extensibility. You can implement custom backends or use the built-in ones interchangeably.
### Using the ParsingBackend Trait
```rust
use parsanol::portable::backend::{ParsingBackend, PackratBackend, BytecodeBackend, Backend};
// Use Packrat backend for predictable O(n) performance
let mut packrat = PackratBackend::new();
let result = packrat.parse(&grammar, input)?;
// Use Bytecode backend for lower memory usage
let mut bytecode = BytecodeBackend::new();
let result = bytecode.parse(&grammar, input)?;
// Configure backends
let packrat = PackratBackend::new()
.with_max_recursion_depth(500)
.with_timeout_ms(5000);
let bytecode = BytecodeBackend::new()
.with_auto_fallback(true); // Falls back to Packrat for complex grammars
```
### Runtime Backend Selection
```rust
use parsanol::portable::backend::Backend;
// Select backend at runtime
let backend_type = Backend::default_for_grammar(&grammar);
match backend_type {
Backend::Packrat => {
let mut parser = PackratBackend::new();
parser.parse(&grammar, input)?
}
Backend::Bytecode => {
let mut parser = BytecodeBackend::new();
parser.parse(&grammar, input)?
}
};
```
### Backend Characteristics
Each backend documents its performance characteristics:
```rust
use parsanol::portable::backend::{ParsingBackend, PackratBackend};
let backend = PackratBackend::new();
let chars = backend.characteristics();
println!("Time: {}", chars.time_complexity); // "O(n)"
println!("Memory: {}", chars.memory_complexity); // "O(n × r)"
println!("Memoization: {}", chars.uses_memoization); // true
println!("Streaming: {}", chars.supports_streaming); // false
println!("Incremental: {}", chars.supports_incremental); // true
println!("Safe: {}", chars.safe_for_all_grammars); // true
```
### Implementing Custom Backends
```rust
use parsanol::portable::backend::{ParsingBackend, BackendCharacteristics, BackendResult};
use parsanol::portable::grammar::Grammar;
struct MyCustomBackend;
impl ParsingBackend for MyCustomBackend {
fn parse(&mut self, grammar: &Grammar, input: &str) -> BackendResult {
// Custom parsing logic here
todo!()
}
fn name(&self) -> &'static str {
"my-custom"
}
fn characteristics(&self) -> BackendCharacteristics {
BackendCharacteristics {
time_complexity: "O(n log n)",
memory_complexity: "O(n)",
uses_memoization: false,
supports_streaming: true,
supports_incremental: false,
safe_for_all_grammars: true,
}
}
}
```
### Dynamic Backend Dispatch
For runtime polymorphism:
```rust
use parsanol::portable::backend::{DynBackend, PackratBackend, BytecodeBackend};
fn get_backend(use_packrat: bool) -> DynBackend {
if use_packrat {
Box::new(PackratBackend::new())
} else {
Box::new(BytecodeBackend::new())
}
}
let mut backend: DynBackend = get_backend(true);
let result = backend.parse(&grammar, input)?;
```
## Quick Start Examples
Using the bytecode backend explicitly:
```rust
use parsanol::portable::{
parser_dsl::{str, re, GrammarBuilder},
bytecode::{Backend, Parser},
};
let grammar = GrammarBuilder::new()
.rule("number", re(r"[0-9]+"))
.build();
// Create parser with bytecode backend
let mut parser = Parser::new(grammar, Backend::Bytecode);
let result = parser.parse("42");
```
Using packrat backend explicitly:
```rust
let mut parser = Parser::new(grammar, Backend::Packrat);
let result = parser.parse("42");
```
## Optimization Passes
The bytecode backend applies 11 optimization passes automatically:
1. `DeadCodeElimination` - Remove unreachable code
2. `JumpChainSimplification` - Simplify jump chains
3. `JumpToReturnSimplification` - Direct returns
4. `JumpToFailSimplification` - Direct failures
5. `CombineAdjacentChars` - Char merging
6. `SpanOptimization` - CharSet* to Span
7. `FullCaptureOptimization` - Capture pairs to FullCapture
8. `TestCharOptimization` - Choice patterns to TestChar
9. `TestSetOptimization` - Choice patterns to TestSet
10. `TailCallOptimization` - Tail calls to jumps
11. `LookaheadOptimization` - Choice to PredChoice for predicates
## Bytecode VM Architecture
```
Grammar (Atoms) ──► Compiler ──► Program (bytecode)
│
▼
Input ──────────────────────────► VM ──► AstNode
```
The bytecode VM uses:
- **Backtracking stack**: For choice point management
- **Capture stack**: For building AST nodes
- **Instruction pointer**: Sequential execution
- **Optimization passes**: Peephole optimization on compiled bytecode
## Instruction Set
The VM supports 28 instructions covering all PEG operations:
| Matching | `Char`, `CharSet`, `String`, `Regex`, `Any`, `Custom` |
| Control Flow | `Jump`, `Call`, `Return`, `End` |
| Backtracking | `Choice`, `Commit`, `PartialCommit`, `BackCommit`, `Fail`, `FailTwice` |
| Captures | `OpenCapture`, `CloseCapture`, `FullCapture` |
| Tests | `TestChar`, `TestSet`, `TestAny` |
| Special | `Behind`, `Span`, `NoOp`, `PredChoice` |
# Architecture
┌─────────────────────────────────────────────────────────────┐
│ PARSANOL-RS │
│ (Generic PEG Parser Library) │
├─────────────────────────────────────────────────────────────┤
│ • Parser combinators (PEG atoms) │
│ • Grammar representation │
│ • Packrat memoization │
│ • Arena allocation │
│ • Infix expression parsing │
│ • Rich error reporting (tree structure) │
│ • Transform DSL (pattern matching) │
│ • Derive macros for typed ASTs │
│ • Optional Ruby FFI / WASM bindings │
└─────────────────────────────────────────────────────────────┘
▲ ▲
│ (build ON TOP) │ (build ON TOP)
│ │
┌─────────┴──────────┐ ┌─────────┴─────────┐
│ parsanol-express │ │ Your Language │
│ (EXPRESS lexer) │ │ (Your DSL) │
└────────────────────┘ └───────────────────┘
> [!IMPORTANT]
> Parsanol-rs is a **GENERIC** parser library. It has no knowledge of
> any specific domain (EXPRESS, Ruby, JSON, YAML, etc.). Domain-specific
> parsers should be built ON TOP of this library.
# Workspace Structure
This repository uses a Cargo workspace with two crates:
```
parsanol-rs/
├── parsanol/ # Main parser library
│ ├── src/
│ └── Cargo.toml
├── parsanol-derive/ # Derive macros (always included)
│ ├── src/
│ └── Cargo.toml
├── examples/ # 39 example parsers
└── Cargo.toml # Workspace root
```
# Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
parsanol = "0.1"
```
The `parsanol-derive` crate is automatically included as a dependency,
providing the `#[derive(FromAst)]` macro for typed AST conversion.
## Optional Features
- `ruby` - Enable Ruby FFI bindings (requires `magnus`, `rb-sys`)
- `wasm` - Enable WebAssembly bindings (requires `wasm-bindgen`,
`js-sys`)
- `parallel` - Enable parallel parsing (requires `rayon`)
```toml
[dependencies]
parsanol = { version = "0.1", features = ["ruby", "parallel"] }
```
# Quick Start
## Basic Parsing
```rust
use parsanol::portable::{Grammar, PortableParser, AstArena, parser_dsl::*};
// Build a simple grammar
let grammar = GrammarBuilder::new()
.rule("greeting", str("hello").then(str("world")))
.build();
let input = "helloworld";
let mut arena = AstArena::for_input(input.len());
let mut parser = PortableParser::new(&grammar, input, &mut arena);
match parser.parse() {
Ok(ast) => println!("Parsed successfully: {:?}", ast),
Err(e) => println!("Parse error: {:?}", e),
}
```
## Calculator with Operator Precedence
```rust
use parsanol::portable::{
GrammarBuilder, PortableParser, AstArena, Grammar,
parser_dsl::{str, re, ref_, seq, choice, dynamic},
infix::{InfixBuilder, Assoc},
};
fn build_calculator_grammar() -> Grammar {
let mut builder = GrammarBuilder::new();
// Define atoms
builder = builder.rule("number", re(r"[0-9]+"));
builder = builder.rule("primary", choice(vec![
dynamic(seq(vec![
dynamic(str("(")),
dynamic(ref_("expr")),
dynamic(str(")")),
])),
dynamic(ref_("number")),
]));
// Build infix with precedence
let expr_atom = InfixBuilder::new()
.primary(ref_("primary"))
.op("*", 2, Assoc::Left)
.op("/", 2, Assoc::Left)
.op("+", 1, Assoc::Left)
.op("-", 1, Assoc::Left)
.build(&mut builder);
builder.update_rule("expr", expr_atom);
builder.build()
}
```
# Parser DSL
## Atom Types
| `str("literal")` | Match exact string | `str("hello")` |
| `re("pattern")` | Match regex pattern | `re(r"[0-9]+")` |
| `any()` | Match any single character | `any()` |
| `ref_("rule")` | Reference to named rule | `ref_("expr")` |
| `seq([...])` | Sequence of atoms | `seq(vec![a, b, c])` |
| `choice([...])` | Alternative atoms | `choice(vec![a, b])` |
| `cut()` | Commit to this branch (prevent backtracking) | `cut()` |
| `capture("name", atom)` | Extract named value during parsing | `capture("id", re(r"[a-z]+"))` |
| `scope(atom)` | Create isolated capture context | `scope(seq([...]))` |
| `dynamic(callback)` | Runtime-determined parsing via callback | `dynamic(callback_id)` |
## Combinators
All atoms implement the `ParsletExt` trait with these methods:
```rust
use parsanol::portable::parser_dsl::*;
// Sequence: A >> B
let parser = str("hello").then(str("world"));
// Repetition
let parser = str("a").repeat(1, None); // One or more
let parser = str("a").repeat(0, Some(3)); // Zero to three
let parser = str("a").many(); // Zero or more
let parser = str("a").many1(); // One or more
let parser = str("a").optional(); // Zero or one
// Named capture
let parser = re(r"[0-9]+").label("number");
// Ignore (don't include in AST)
let parser = str(" ").ignore();
// Lookahead (don't consume)
let parser = str("hello").lookahead(); // Positive: must match
let parser = str("hello").not_ahead(); // Negative: must NOT match
```
## Grammar Macro
For declarative grammar definition:
```rust
use parsanol::portable::parser_dsl::grammar;
let grammar = grammar! {
"hello" => str("hello"),
"world" => str("world"),
"greeting" => ref_("hello").then(ref_("world")),
};
```
# Capture Atoms
Capture atoms extract named values during parsing, similar to regex named groups. They work with all backends (Packrat, Bytecode, Streaming).
## Basic Usage
```rust
use parsanol::portable::{
parser_dsl::{capture, dynamic, re, seq, GrammarBuilder},
PortableParser, AstArena,
};
let grammar = GrammarBuilder::new()
.rule("greeting", seq(vec![
capture("word", dynamic(re(r"[a-zA-Z]+"))),
]))
.build();
let mut arena = AstArena::for_input(64);
let mut parser = PortableParser::packrat(grammar);
let result = parser.parse_from_pos(0, "hello world", &mut arena)?;
// Access captures
if let Some(text) = result.get_capture("word", "hello world") {
println!("Captured: {}", text); // Prints: "hello"
}
```
## Capture API
```rust
// Get a single capture by name
let value = result.get_capture("name", input);
// Get all capture names
for name in result.capture_names() {
println!("Capture: {}", name);
}
// Check if capture exists
if result.has_capture("name") {
// ...
}
```
## Backend Compatibility
| Packrat | Full | Native support |
| Bytecode | Full | Uses capture instructions |
| Streaming | Full | Captures persist across chunks |
# Scope Atoms
Scope atoms create isolated capture contexts. Captures made inside a scope are discarded when the scope exits, preventing pollution of the parent context.
## Use Cases
- Nested parsing where inner captures shouldn't affect outer state
- Repetitive patterns where each iteration starts fresh
- Context isolation in recursive grammars
## Basic Usage
```rust
use parsanol::portable::parser_dsl::{scope, seq, capture, dynamic, re, GrammarBuilder};
let grammar = GrammarBuilder::new()
.rule("outer", seq(vec![
capture("outer_name", dynamic(re(r"[a-z]+"))),
scope(seq(vec![
capture("inner_name", dynamic(re(r"[0-9]+"))),
])),
// "inner_name" is NOT available here
]))
.build();
```
# Dynamic Atoms
Dynamic atoms enable runtime-determined parsing via registered callbacks. This allows context-sensitive parsing where the grammar itself depends on input or previously captured values.
## Registering a Callback
```rust
use parsanol::portable::{
Grammar, Atom, Parser,
dynamic::{DynamicCallback, DynamicContext, register_dynamic_callback},
parser_dsl::*,
};
struct KeywordCallback;
impl DynamicCallback for KeywordCallback {
fn call(&self, ctx: &DynamicContext) -> Option<Atom> {
// Access current position
let pos = ctx.pos();
// Access input
let input = ctx.input();
// Access captures made so far
if let Some(lang) = ctx.get_capture("language") {
match lang {
"ruby" => Some(Atom::Str { pattern: "def".into() }),
"python" => Some(Atom::Str { pattern: "lambda".into() }),
_ => None,
}
} else {
None
}
}
fn description(&self) -> &str {
"keyword_callback"
}
}
let callback_id = register_dynamic_callback(Box::new(KeywordCallback));
```
## Using Dynamic Atoms in Grammars
```rust
let grammar = GrammarBuilder::new()
.rule("keyword", dynamic_with_id(callback_id))
.build();
```
## Backend Compatibility
| Packrat | Full | Native support (recommended) |
| Bytecode | Fallback | Uses Packrat internally |
| Streaming | Fallback | Uses Packrat internally |
**Note:** For heavy dynamic atom usage, prefer the Packrat backend for best performance.
# Streaming with Captures
The streaming parser supports captures while maintaining bounded memory usage. Captures persist across streaming parse operations.
## Basic Usage
```rust
use parsanol::portable::{
parser_dsl::{capture, dynamic, re, GrammarBuilder},
streaming::{StreamingParser, ChunkConfig},
arena::AstArena,
};
use std::io::Cursor;
let grammar = GrammarBuilder::new()
.rule("word", capture("word", dynamic(re(r"[a-zA-Z]+"))))
.build();
let config = ChunkConfig {
chunk_size: 65536, // 64 KB chunks
window_size: 2, // Keep 2 chunks in memory
};
let mut parser = StreamingParser::new(&grammar, config);
let mut arena = AstArena::for_input(65536);
let mut cursor = Cursor::new(input.as_bytes());
let result = parser.parse_from_reader(&mut cursor, &mut arena)?;
if let Some(captures) = &result.capture_state {
for name in captures.names() {
if let Some(value) = captures.get(&name) {
println!("{} = {:?}", name, value.get_text(input));
}
}
}
```
## Chunk Configuration
| `small()` | 16 KB | 2 | Real-time feeds |
| `medium()` | 64 KB | 3 | Default |
| `large()` | 256 KB | 4 | Log files |
| `huge()` | 1 MB | 5 | Large files |
## Performance Notes
- Memory: O(chunk_size × window_size + capture_state)
- Captures accumulate during parse, available at end
- For very large captures, use `reset()` to process incrementally
# Transform System
The transform system converts generic parse trees into typed Rust data
structures, similar to Parslet’s transformation system.
## Value Types
The `Value` enum represents transformed data:
```rust
pub enum Value {
Nil,
Bool(bool),
Int(i64),
Float(f64),
String(String),
Array(Vec<Value>),
Hash(HashMap<String, Value>),
}
```
## Basic Transformations
```rust
use parsanol::portable::transform::{Transform, Value, TransformError};
let transform = Transform::new()
// Transform "int" captures by doubling
.rule("int", |v| {
let n = v.as_int().ok_or_else(|| TransformError::Custom("not int".into()))?;
Ok(Value::int(n * 2))
});
let value = Value::hash(vec![("int", Value::int(21))]);
let result = transform.apply(&value)?;
assert_eq!(result.as_int(), Some(42));
```
## Pattern Matching
Pattern-based transformations similar to Parslet:
```rust
use parsanol::portable::transform::{Transform, Pattern, Value};
let transform = Transform::new()
// Match hash with specific fields
.pattern(
Pattern::hash()
.field("left", "l")
.field("op", Pattern::str("+"))
.field("right", "r"),
|bindings| {
let l = bindings.get_int("l")?;
let r = bindings.get_int("r")?;
Ok(Value::int(l + r))
}
);
```
## Pattern Types
| `Pattern::simple("x")` | Match any leaf value and bind to variable | `Pattern::simple("n")` matches `42` |
| `Pattern::str("value")` | Match a specific string value | `Pattern::str("+")` matches `"+"` |
| `Pattern::int(n)` | Match a specific integer | `Pattern::int(42)` matches `42` |
| `Pattern::sequence("x")` | Match an array and bind to variable | `Pattern::sequence("items")` |
| `Pattern::subtree("x")` | Match anything and bind to variable | `Pattern::subtree("node")` |
| `Pattern::hash()` | Match a hash with specific fields | See example above |
## Converting AST to Value
```rust
use parsanol::portable::transform::{ast_to_value, Value};
// After parsing
let ast = parser.parse()?;
let value = ast_to_value(&ast, &arena, input);
// Now apply transforms
let result = transform.apply(&value)?;
```
# Derive Macros
The `FromAst` derive macro automatically generates code to convert `Value`
types into typed Rust structs and enums. This eliminates boilerplate code
for AST transformation.
## Basic Usage
```rust
use parsanol::derive::FromAst;
use parsanol::portable::transform::Value;
#[derive(FromAst, Debug)]
pub enum Expr {
#[parsanol(tag = "number")]
Number(i64),
#[parsanol(tag = "binop")]
BinOp {
left: Box<Expr>,
op: String,
right: Box<Expr>,
},
}
// Convert Value to typed Expr
let value: Value = /* ... parsed value ... */;
let expr: Expr = value.try_into()?;
```
## Container Attributes
| `#[parsanol(rule = "name")]` | Specify the grammar rule name |
## Variant Attributes (for enums)
| `#[parsanol(tag = "literal")]` | Match by literal tag string |
| `#[parsanol(tag_expr = expr)]` | Match by expression (for dynamic tags) |
## Field Attributes
| `#[parsanol(field = "name")]` | Map to different hash field name |
| `#[parsanol(default)]` | Use `Default::default()` if missing |
| `#[parsanol(default = expr)]` | Use expression if missing |
## Complete Example
```rust
use parsanol::derive::FromAst;
use parsanol::portable::transform::Value;
#[derive(FromAst, Debug)]
#[parsanol(rule = "statement")]
pub enum Statement {
#[parsanol(tag = "assignment")]
Assignment {
#[parsanol(field = "name")]
variable: String,
value: Box<Expr>,
},
#[parsanol(tag = "return")]
Return {
#[parsanol(default)]
value: Option<Box<Expr>>,
},
#[parsanol(tag = "if")]
If {
condition: Box<Expr>,
then_block: Vec<Statement>,
#[parsanol(default)]
else_block: Option<Vec<Statement>>,
},
}
// Usage
fn parse_statement(value: Value) -> Result<Statement, parsanol::derive::FromAstError> {
value.try_into()
}
```
## Single-Field Tuple Structs
Single-field tuple structs automatically get transparent conversion:
```rust
#[derive(FromAst)]
pub struct Identifier(pub String);
// Value::String("foo") directly converts to Identifier("foo")
```
## Error Handling
```rust
use parsanol::derive::FromAstError;
match value.try_into() {
Ok(expr) => println!("Parsed: {:?}", expr),
Err(FromAstError::MissingField(field)) => {
eprintln!("Missing field: {}", field);
}
Err(FromAstError::UnknownTag) => {
eprintln!("Unknown tag in enum");
}
Err(e) => eprintln!("Conversion error: {}", e),
}
```
# Streaming Builder
The streaming builder API allows single-pass parsing without
intermediate AST construction. This is ideal for:
- Maximum performance (eliminates AST allocation)
- Custom output formats
- Memory-constrained environments
## Implementing StreamingBuilder
```rust
use parsanol::portable::streaming_builder::{StreamingBuilder, BuildResult, BuildError};
// Custom builder that collects all strings
struct StringCollector {
strings: Vec<String>,
}
impl StreamingBuilder for StringCollector {
type Output = Vec<String>;
fn on_string(&mut self, value: &str, _offset: usize, _length: usize) -> BuildResult<()> {
self.strings.push(value.to_string());
Ok(())
}
fn finish(&mut self) -> BuildResult<Self::Output> {
Ok(std::mem::take(&mut self.strings))
}
}
```
## Using parse_with_builder
```rust
use parsanol::portable::{Grammar, PortableParser, AstArena};
let grammar = /* ... */;
let input = "hello world";
let mut arena = AstArena::for_input(input.len());
let mut parser = PortableParser::new(&grammar, input, &mut arena);
// Create builder
let mut builder = StringCollector { strings: vec![] };
// Parse with streaming builder
let result = parser.parse_with_builder(&mut builder)?;
// result: Vec<String>
```
## Built-in Builders
Several useful builders are provided:
| `DebugBuilder` | Collects all events as strings for debugging |
| `BuilderStringCollector` | Collects all string values |
| `BuilderNodeCounter` | Counts nodes by type |
## Ruby Integration
The streaming builder works with Ruby callbacks via FFI:
```ruby
require 'parsanol'
class MyBuilder
include Parsanol::BuilderCallbacks
def initialize
@strings = []
end
def on_string(value, offset, length)
@strings << value
end
def finish
@strings
end
end
builder = MyBuilder.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
```
# Parallel Parsing
Parse multiple inputs in parallel using rayon for linear speedup on
multi-core systems.
## Enabling Parallel Feature
```toml
[dependencies]
parsanol = { version = "0.1", features = ["parallel"] }
```
## Batch Parallel Parsing
```rust
use parsanol::portable::{Grammar, parse_batch_parallel};
let grammar = /* ... */;
let inputs = vec!["file1.exp", "file2.exp", "file3.exp"];
// Parse all inputs in parallel
let results = parse_batch_parallel(&grammar, &inputs);
// Results are in same order as inputs
for (i, result) in results.iter().enumerate() {
match result {
Ok(ast) => println!("File {} parsed successfully", i),
Err(e) => eprintln!("File {} failed: {}", i, e),
}
}
```
## Parallel Configuration
```rust
use parsanol::portable::parallel::{parse_batch_parallel, ParallelConfig};
let config = ParallelConfig::new()
.with_num_threads(4) // Use 4 threads
.with_min_chunk_size(10); // Minimum inputs per thread
let results = parse_batch_parallel(&grammar, &inputs);
```
## Performance
| 8 cores, 100 files | ~8x faster than sequential |
| 4 cores, 50 files | ~4x faster than sequential |
| Single core | Same as sequential (graceful fallback) |
When the `parallel` feature is not enabled, the functions fall back to
sequential parsing automatically.
# Infix Expression Parsing
Built-in support for parsing infix expressions with operator precedence
and associativity.
## Using InfixBuilder
```rust
use parsanol::portable::infix::{InfixBuilder, Assoc};
let mut builder = GrammarBuilder::new();
let expr_idx = InfixBuilder::new()
.primary(ref_("atom")) // Base expression (numbers, parens)
.op("*", 2, Assoc::Left) // Higher precedence
.op("/", 2, Assoc::Left)
.op("+", 1, Assoc::Left) // Lower precedence
.op("-", 1, Assoc::Left)
.op("^", 3, Assoc::Right) // Right-associative
.build(&mut builder);
```
## Associativity
| `Assoc::Left` | Left-to-right evaluation | `a` `-` `b` `-` `c` = `(a` `-` `b)` `-` `c` |
| `Assoc::Right` | Right-to-left evaluation | `a` `=` `b` `=` `c` = `a` `=` `(b` `=` `c)` |
| `Assoc::NonAssoc` | Cannot chain | `a` `<` `b` `<` `c` is an error |
# Rich Error Reporting
Tree-structured error messages similar to Parslet for better debugging.
## Basic Usage
```rust
use parsanol::portable::error::{RichError, ErrorBuilder, Span};
// Create rich errors
let error = ErrorBuilder::new("Failed to parse expression")
.at(10, 2, 5) // offset, line, column
.context("expression")
.child(
ErrorBuilder::new("Expected '+' or '-'")
.at(10, 2, 5)
.build(),
)
.build();
// Print as ASCII tree
println!("{}", error.ascii_tree());
```
## Example Output
Error at line 3, column 5:
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
## Source Context
```rust
// Format error with source code context
let formatted = error.format_with_source(input);
println!("{}", formatted);
```
Output:
Error at line 3, column 5:
let x = foo bar
^
`- Failed to parse expression (in expression)
`- Expected '+' or '-'
# Source Location Tracking
Track source positions through the parsing and transformation pipeline.
## Using SourceSpan
```rust
use parsanol::portable::source_location::{SourceSpan, SourcePosition};
use parsanol::portable::transform::{ast_to_value_with_span};
// Create a span from offsets
let span = SourceSpan::from_offsets(input, 10, 20);
println!("Line {}, Column {}", span.start.line, span.start.column);
// Merge adjacent spans
let merged = span1.merge(&span2);
// Check overlap
if span1.overlaps(&span2) {
// Spans overlap
}
// Transform AST with source spans preserved
let (value, spans) = ast_to_value_with_span(&ast, &arena, input);
```
# Grammar Composition
Build complex grammars by importing and composing smaller grammars.
## Importing Grammars
```rust
use parsanol::portable::parser_dsl::*;
let mut builder = GrammarBuilder::new();
// Import another grammar with a prefix
builder.import(&expression_grammar, Some("expr"));
builder.import(&type_grammar, Some("type"));
// Reference imported rules
let combined = seq(vec![
ref_("expr:root"), // References expression_grammar's root
str(":"),
ref_("type:root"), // References type_grammar's root
]);
builder.rule("typed_expr", combined);
let grammar = builder.build();
```
# Ruby FFI
Parsanol-rs can be compiled as a Ruby extension for use with
parsanol-ruby.
## Features
The Ruby FFI provides:
- **26x faster** parsing than pure Ruby (Parslet)
- **Single `parse()` API** - no confusing options
- **Lazy line/column** - zero overhead unless needed
- **Streaming Builder** - single-pass parsing with callbacks
## Building for Ruby
```bash
# Build with Ruby support
cargo build --features ruby
# The resulting library can be loaded as a Ruby extension
```
## Ruby API
```ruby
require 'parsanol/native'
# Serialize grammar once
grammar = str('hello').as(:greeting) >> str(' ').maybe >> match('[a-z]').repeat(1).as(:name)
grammar_json = Parsanol::Native.serialize_grammar(grammar)
# Parse - simple and clean
result = Parsanol::Native.parse(grammar_json, "hello world")
# => {greeting: "hello"@0, name: "world"@6}
# Line/column available when needed (computed lazily)
result[:greeting].line_and_column # => [1, 1]
result[:name].line_and_column # => [1, 7]
```
## Lazy Line/Column
Slice objects support lazy line/column computation:
- `slice.offset` - character position (always available, zero cost)
- `slice.content` - string value (always available, zero cost)
- `slice.line_and_column` - [line, column] tuple (computed lazily, cached)
This provides **zero overhead** for users who don't need position info,
while keeping line/column **always available** when needed.
## Streaming Builder (Ruby)
For maximum performance, use the streaming builder API:
```ruby
require 'parsanol'
# Define a builder class
class StringCollector
include Parsanol::BuilderCallbacks
def initialize
@strings = []
end
def on_string(value, offset, length)
@strings << value
end
def on_int(value)
@strings << value.to_s
end
def finish
@strings
end
end
# Parse with streaming builder
builder = StringCollector.new
result = Parsanol::Native.parse_with_builder(grammar_json, input, builder)
# result: ["42", "+", "8"]
```
See [parsanol-ruby](https://github.com/parsanol/parsanol-ruby) for full
documentation.
# WASM Support
Parsanol-rs can be compiled to WebAssembly for use in browsers or
Node.js.
## Building for WASM
```bash
# Install wasm-pack
cargo install wasm-pack
# Build for web
wasm-pack build --features wasm --target web
```
## JavaScript API
```javascript
import { Parser, Grammar } from 'parsanol';
const grammar = Grammar.fromJson({
atoms: [
{ Str: { pattern: "hello" } }
],
root: 0
});
const parser = new Parser(grammar);
const result = parser.parse("hello");
```
# Debug Tools
## Parser Tracing
Enable tracing for debugging:
```rust
let (result, trace) = parser.parse_with_trace();
// Print trace
println!("{}", trace.format(&grammar));
```
## Grammar Visualization
```rust
use parsanol::portable::debug::GrammarVisualizer;
let viz = GrammarVisualizer::new(&grammar);
// Generate Mermaid diagram
println!("{}", viz.to_mermaid());
// Generate GraphViz DOT
println!("{}", viz.to_dot());
```
# Performance
Parsanol-rs is designed for high performance:
- **18-44x Faster** than pure Ruby parsers (Parslet)
- **99.5% Fewer Allocations** through arena allocation
- **O(n) Parsing** via packrat memoization
- **SIMD Optimization**: Fast character matching via memchr
- **AHash**: Fast hashing for cache lookups
- **SmallVec**: Stack-allocated small collections
## Benchmarks
| parsanol-rs (Ruby Transform) | 1KB JSON | ~50µs |
| parsanol-rs (Serialized) | 1KB JSON | ~30µs |
| parsanol-rs (Native) | 1KB JSON | ~20µs |
| Pure Ruby (Parslet) | 1KB JSON | ~800µs |
# Security
Parsanol-rs includes built-in protection against denial-of-service
attacks.
## Default Limits
| `max_input_size` | 100 MB | Maximum input size in bytes |
| `max_recursion_depth` | 1000 | Maximum recursion depth for nested structures |
## Custom Limits
For untrusted input, configure custom limits:
```rust
use parsanol::portable::{PortableParser, AstArena, Grammar, ParseError};
// For untrusted input, use stricter limits
let mut parser = PortableParser::with_limits(
&grammar,
input,
&mut arena,
10 * 1024 * 1024, // 10 MB max input
100, // 100 max recursion depth
);
match parser.parse() {
Ok(ast) => { /* success */ },
Err(ParseError::InputTooLarge { input_size, max_size }) => {
eprintln!("Input too large: {} > {}", input_size, max_size);
},
Err(ParseError::RecursionLimitExceeded { depth, max_depth }) => {
eprintln!("Recursion too deep: {} > {}", depth, max_depth);
},
Err(e) => { /* other errors */ },
}
```
## Best Practices
1. **Always limit input size** when parsing untrusted data
2. **Use external timeouts** for network services (e.g., `tokio::time::timeout`)
3. **Monitor memory usage** in production environments
See [SECURITY.md](SECURITY.md) for complete security documentation.
# Module Reference
## Core Modules
| `portable::parser` | PEG parsing engine with packrat memoization |
| `portable::grammar` | Grammar representation and serialization |
| `portable::ast` | AST node types |
| `portable::arena` | Arena allocator for AST nodes |
| `portable::cache` | Dense cache for memoization |
| `portable::parser_dsl` | Fluent API for grammar definition |
| `portable::transform` | Transform system for converting parse trees |
| `portable::error` | Rich error reporting |
| `portable::infix` | Infix expression parsing with precedence |
| `portable::debug` | Debugging and visualization tools |
| `portable::source_location` | Source span tracking with line/column info |
| `portable::streaming` | Streaming parser support for large inputs |
| `portable::streaming_builder` | Single-pass parsing with custom builders |
| `portable::parallel` | Parallel parsing for batch processing |
| `portable::incremental` | Incremental parsing for editor integration |
| `portable::visitor` | AST visitor pattern implementation |
| `portable::source_map` | Source map generation for debugging |
# Examples
See the `examples/` directory for 39 complete examples demonstrating
real-world parsing scenarios:
## Expression Parsers
| `calculator-pattern` | Parse expressions with pattern-based transforms |
| `calculator-transform` | Parse and evaluate expressions with native transforms |
| `boolean-algebra` | Parse boolean expressions with AND, OR, NOT operators |
| `expression-evaluator` | Evaluate expressions with variables and function calls |
| `prec-calc` | Precedence climbing algorithm for infix expressions |
## Data Formats
| `json-pattern` | JSON parser with pattern matching |
| `json-transform` | JSON parser with native transforms |
| `csv-pattern` | CSV parser handling quoted fields (pattern mode) |
| `csv-transform` | CSV parser handling quoted fields (transform mode) |
| `ini` | INI configuration file parser |
| `simple-xml` | XML parser with tag matching |
| `markup` | Lightweight markup language parser |
| `toml` | TOML configuration file parser |
| `yaml` | YAML subset parser |
| `markdown` | Markdown subset parser with headers and lists |
| `iso-8601` | ISO 8601 date/time/duration parser |
| `iso-6709` | ISO 6709 geographic coordinate parser |
## URLs & Network
| `url` | URL parser with scheme, host, path components |
| `email` | Email address parser with validation |
| `ip-address` | IPv4/IPv6 address parser with validation |
## Code & Templates
| `erb` | ERB template parser for Ruby templates |
| `sexp` | S-expression parser for Lisp-style syntax |
| `minilisp` | MiniLisp parser demonstrating recursive grammars |
## Text Processing
| `balanced-parens` | Balanced parentheses parser |
| `string-literal` | String literal parser with escape sequences |
| `sentence` | Sentence parser with Unicode support |
| `comments` | Comment parser (line and block comments) |
## Error Handling
| `error-reporting` | Rich error reporting with tree structure |
| `error-recovery` | Error recovery strategies |
| `deepest-errors` | Deepest error point tracking |
| `nested-errors` | Nested error tree visualization |
## Advanced Features
| `streaming` | Streaming parser for large inputs |
| `incremental` | Incremental parsing for editor integration |
| `linter` | Code linter with custom validation |
| `custom-atoms` | Custom atom registration |
| `modularity` | Grammar composition from modules |
Run examples with:
```bash
cargo run --example calculator-transform
cargo run --example json-pattern
cargo run --example url
```
Full documentation and interactive examples available at [the
website](https://parsanol.github.io/examples).
# API Stability
The API is currently in active development. Version 0.x indicates that
breaking changes may occur.
Stable APIs:
- `Grammar` and `GrammarBuilder`
- `PortableParser` basic parsing
- `AstArena` and `AstNode`
- Parser DSL combinators
- Streaming builder trait and built-in builders
- Parallel parsing functions
Experimental APIs (may change):
- `Transform` and pattern matching
- Rich error reporting
- Infix expression parsing
- Debug/trace tools
# Documentation
## Architecture
See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the overall system architecture.
## Development
- [docs/refactoring-plan.md](docs/refactoring-plan.md) - Current refactoring roadmap
- [docs/continuation-prompt.md](docs/continuation-prompt.md) - Prompt for continuing work
- [docs/MIGRATION.md](docs/MIGRATION.md) - Migration guide from Parslet
# License
MIT License - see [LICENSE](LICENSE) file for details.
# Contributing
Contributions are welcome! Please feel free to submit issues and pull
requests at [GitHub](https://github.com/parsanol/parsanol-rs).
## Development Setup
```bash
# Clone the repository
git clone https://github.com/parsanol/parsanol-rs.git
cd parsanol-rs
# Build (workspace)
cargo build
# Run tests (234 unit tests)
cargo test --lib
# Run all examples
cargo build --examples
# Run benchmarks
cargo bench
# Check code quality
cargo clippy --lib -- -D warnings
cargo fmt --check
```
## Testing
The test suite consists of multiple types of tests:
**Unit tests:** 234 tests covering internal functionality of each module
(parser, arena, cache, transform, derive, etc.).
**Integration tests:** Located in `tests/` directory, test end-to-end parsing scenarios.
**Examples:** 39 runnable parsers in `examples/` directory demonstrating real-world usage.
Examples are compiled and tested via `cargo build --examples`.
**Documentation tests (doc tests):** Code examples in documentation comments. Note that many doc tests are marked with `ignore` because they show **incomplete code snippets** (e.g., method signatures or pseudocode) rather than complete runnable examples. This is intentional - the doc tests illustrate API patterns, while the `examples/` directory contains fully runnable code that is verified by CI.
To run all tests:
```bash
# Unit + integration tests
cargo test
# Include doc tests (most will be ignored as designed)
cargo test --doc
# Test examples compile
cargo build --examples
# Run ignored doc tests (will fail if not complete)
cargo test -- --ignored
```
# Release Process
This project uses [release-plz](https://release-plz.dev/) for automated releases.
## How It Works
1. **Push to main** → release-plz creates/updates a Release PR
2. **Review and merge the Release PR** → Version is updated in main
3. **After merge** → release-plz automatically:
- Creates a git tag (e.g., `v0.1.2`)
- Publishes to crates.io
- Creates a GitHub release
4. **Build artifacts** → CI builds native libraries and uploads them to the GitHub release
## Maintainer Workflow
### Normal Release (Recommended)
Just push commits with conventional commit messages:
```bash
git commit -m "feat: add new parser combinator"
git push origin main
```
release-plz will:
1. Create a Release PR with version bump (e.g., `0.1.1` → `0.1.2` for `feat:`)
2. Wait for you to review and merge
3. Publish automatically after merge
### Manual Release
If you need to trigger a release manually:
1. Go to **Actions** → **Release** workflow
2. Click **Run workflow**
3. Select action:
- `auto` (default): Let release-plz decide
- `release-pr`: Just create/update the Release PR
- `release`: Force a release immediately
### Version Bump Rules
release-plz uses [conventional commits](https://www.conventionalcommits.org/):
| `feat:` | Minor (0.1.0 → 0.2.0) |
| `fix:` | Patch (0.1.0 → 0.1.1) |
| `feat!:` or `fix!:` | Major (0.1.0 → 1.0.0) |
| `docs:`, `chore:`, etc. | No bump (changelog only) |
### What Gets Released
- **crates.io**: `parsanol` crate
- **GitHub Release**: With release notes
- **Build Artifacts**: Native libraries for Linux, macOS, Windows (x64, ARM64)
### Troubleshooting
**"Already published" error:**
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
**No Release PR created:**
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the `release-pr` job
**Publish failed:**
- Check crates.io API token is valid
- Check version doesn't already exist on crates.io
## FFI Feature Testing
This crate supports optional Ruby and WebAssembly (WASM) FFI features.
These must be tested explicitly.
> [!IMPORTANT]
> FFI features require additional setup and may not compile/link in all
> environments. Always verify FFI code compiles before pushing to CI.
### Ruby FFI Testing
The Ruby FFI uses the `magnus` crate to provide Ruby bindings.
**Prerequisites:**
- Ruby 3.0+ installed
- Ruby development headers (macOS: `brew install ruby`, Ubuntu: `sudo apt-get install ruby-dev`)
**Testing:**
```bash
# Compile-time check (no Ruby required for linking)
cargo check --features ruby
cargo clippy --features ruby --lib -- -D warnings
# Full integration tests (requires Ruby runtime)
# Note: These tests are marked #[ignore] - run manually
cargo test --features ruby --test ruby_ffi -- --ignored
```
**Test coverage:**
- `tests/ruby_ffi.rs` - Comprehensive tests for RubyBuilder, RubyObject trait
- Magnus type annotations (e.g., `funcall::<&str, (), Value>`)
- Error handling from Ruby callbacks
- Parse result conversion
### WebAssembly FFI Testing
The WASM FFI uses `wasm-bindgen` for JavaScript bindings.
**Prerequisites:**
- `wasm-pack` installed (`cargo install wasm-pack`)
**Testing:**
```bash
# Compile-time check
cargo check --features wasm
cargo clippy --features wasm --lib -- -D warnings
# Full WASM build and test
wasm-pack build --features wasm
wasm-pack test --node --features wasm
```
**Test coverage:**
- `tests/wasm_ffi.rs` - Tests for WASM exports, grammar serialization
- JsValue conversions
- Error handling for WASM
### CI Integration
CI automatically tests FFI features:
``` yaml
# From .github/workflows/ci.yml
strategy:
matrix:
feature: ["", "logging", "ruby", "wasm"]
```
The Ruby and WASM feature tests run on every push to catch FFI
regressions early.
# Release Process
This project uses [release-plz](https://release-plz.dev/) for automated releases.
## How It Works
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ RELEASE-PLZ WORKFLOW │
└─────────────────────────────────────────────────────────────────────────────┘
Push to main
│
▼
┌─────────────────┐
│ release-pr job │ Creates/updates Release PR
└────────┬────────┘
│
▼
┌─────────────────┐
│ Release PR │ Contains version bump + changelog
│ (on GitHub) │
└────────┬────────┘
│
│ Maintainer reviews and merges
▼
┌─────────────────┐
│ release job │ Runs release-plz release
└────────┬────────┘
│
├──────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Create tag │ │ Publish to │
│ (v0.1.2) │ │ crates.io │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ GitHub Release │ With release notes
└────────┬────────┘
│
▼
┌─────────────────┐
│ Build jobs │ Build native libraries
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Release │ Upload artifacts
└─────────────────┘
```
## Maintainer Workflow
### Normal Release (Recommended)
Just push commits with conventional commit messages:
```bash
git commit -m "feat: add new parser combinator"
git push origin main
```
release-plz will:
1. Create a Release PR with version bump (e.g., `0.1.1` → `0.1.2` for `feat:`)
2. Wait for you to review and merge
3. Publish automatically after merge
### Manual Release
If you need to trigger a release manually:
1. Go to **Actions** → **Release** workflow
2. Click **Run workflow**
3. Select action:
- `auto` (default): Let release-plz decide
- `release-pr`: Just create/update the Release PR
- `release`: Force a release immediately
### Version Bump Rules
release-plz uses [conventional commits](https://www.conventionalcommits.org/):
| `feat:` | Minor (0.1.0 → 0.2.0) |
| `fix:` | Patch (0.1.0 → 0.1.1) |
| `feat!:` or `fix!:` | Major (0.1.0 → 1.0.0) |
| `docs:`, `chore:`, etc. | No bump (changelog only) |
### What Gets Released
- **crates.io**: `parsanol` crate
- **GitHub Release**: With release notes
- **Build Artifacts**: Native libraries for Linux, macOS, Windows (x64, ARM64)
### Troubleshooting
**"Already published" error:**
- release-plz sees an existing tag and thinks the version is already published
- Solution: Ensure Cargo.toml version matches what you want to publish
**No Release PR created:**
- Check that commits follow conventional commit format
- Check GitHub Actions logs for the `release-pr` job
**Publish failed:**
- Check that the `crates.io` environment is configured in repository settings
- Check that trusted publishing is enabled
# See Also
- [parsanol-ruby](https://github.com/parsanol/parsanol-ruby) - Ruby
bindings
- [Documentation
Website](https://github.com/parsanol/parsanol.github.io)
- [Parslet](https://github.com/kschiess/parslet) - Original Ruby PEG
parser (inspiration)