sipha
A PEG (Parsing Expression Grammar) parser with a stack-based VM, green/red syntax trees, and optional packrat memoisation.
More documentation: API on docs.rs · Cookbook (patterns for real grammars) · Repository architecture (internals).
Key capabilities:
- A stack-based VM
- Green/red syntax trees (with trivia)
- Optional packrat memoisation (
Engine::with_memo) - Diagnostics utilities (
LineIndex,ParsedDoc, and error formatting) - Optional extras (DOT/PEG display, formatting, diffing, analysis) via features
Quick start
use *;
Examples
Run from this repo:
cargo run -p sipha --example arithmetic_exprcargo run -p sipha --example ini_grammarcargo run -p sipha --example tiny_lang_recoverycargo run -p sipha --example grammar_dslcargo run -p sipha --example sublanguage_markdown_json
Cookbook
See sipha/docs/COOKBOOK.md.
Features
- Grammar builder — Compose rules with combinators:
byte,class,literal,choice,optional,zero_or_more,node,token, etc. - Green/red trees — Immutable green tree plus position-aware red layer; trivia-aware iterators.
- Structured errors —
ParseError::NoMatchcarries aDiagnosticwith furthest position and expected tokens. - Packrat memoisation —
Engine::with_memo()for guaranteed O(n) on grammars with backtracking. - SIMD literals — Fast literal matching for long tokens (SSE2/AVX2 on x86_64).
- Byte dispatch — O(1) first-byte dispatch for rules like JSON
value.
Cargo features
Default: walk (red-tree visitor). Enable extras with sipha = { version = "3", features = ["…"] } (combine as needed).
| Feature | Depends on | Description |
|---|---|---|
walk |
— | Red-tree Visitor traversal (on by default). |
emit |
walk |
Serialize syntax trees to text (tree::emit). |
transform |
— | Tree transforms (tree::transform). Required for sourcemap. |
miette |
— | Pretty diagnostics via miette; see examples/miette_errors.rs. |
utf16 |
— | UTF-16 line/column helpers in diagnostics::utf16 (e.g. LSP). |
analysis |
walk |
Scope extents and definition collection: sipha::extras::analysis. |
display |
— | PEG text and DOT export: sipha::extras::display. |
sourcemap |
transform |
Span mapping after transform: sipha::extras::sourcemap. |
fmt |
emit |
Formatting presets: sipha::extras::fmt. |
diff |
emit |
Tree diff, S-expression test helpers, assert_parse!: sipha::extras::diff. |
incremental |
— | Incremental reparse: sipha::parse::incremental (reparse, build_green_tree_with_reuse, TextEdit, TextEdit::apply_edits). APIs take edits: &[TextEdit]; use std::slice::from_ref for a single edit. |
use sipha::prelude::* re-exports the same symbols as the nested sipha::extras::* modules when the matching feature is enabled, so you can import helpers from the prelude without typing extras every time. With incremental, the prelude also re-exports reparse, TextEdit, and related helpers.
The sipha-macros crate stays separate (procedural macros must live in their own crate).
Quick example
use *;
Parsing with a syntax tree
Use node and token in the builder to emit green/red tree events, then build the tree from the parse output:
use *;
let mut g = new;
g.rule;
let built = g.finish.unwrap;
let graph = built.as_graph;
let mut engine = new;
let out = engine.parse.unwrap;
let root = out.syntax_root.unwrap;
assert_eq!;
Error handling
On parse failure, ParseError::NoMatch(diagnostic) contains the furthest byte reached and the set of expected tokens:
if let Err = engine.parse
Grammar macro (sipha-macros)
The sipha-macros crate provides a sipha_grammar! macro so you can write rules in a PEG-style DSL:
use *;
use sipha_grammar;
let built = sipha_grammar! ;
let graph = built.as_graph;
See the sipha-macros README for full syntax. Run: cargo run --example macro_grammar
Examples
Run from this repo: cargo run -p sipha --example <name> (add --features … when noted).
| Example | Topic |
|---|---|
arithmetic_expr |
Small expression grammar and tree building |
grammar_dsl |
sipha_grammar! DSL |
ini_grammar |
INI-like config grammar |
tiny_lang_recovery |
Recovery patterns for a small language |
sublanguage_markdown_json |
Sub-language embedding (markdown + JSON) |
Safety note
BuiltGraph::as_graph() returns a ParseGraph<'_> that borrows all tables from the BuiltGraph. The graph must not outlive the BuiltGraph. Generated static grammars from codegen::emit_rust use ParseGraph<'static> with real &'static slices.