Sipha
A flexible, incremental parsing library for Rust with support for multiple parsing algorithms. Sipha is designed from the ground up to support incremental parsingβthe ability to efficiently re-parse only the changed portions of your code, making it ideal for interactive applications like IDEs, editors, and language servers.
π Read the Book - Comprehensive guide and documentation
Status: 0.5.0 Release
Sipha 0.5.0 provides a stable foundation for incremental parsing with multiple backends. The core API is stable, though some advanced features may continue to evolve. We welcome feedback and contributions!
Key Features
Incremental Parsing (Primary Focus)
Sipha's standout feature is its incremental parsing capability. When you edit code, Sipha can:
- Reuse unchanged subtrees from previous parses
- Reparse only affected regions instead of the entire file
- Maintain parse caches for efficient updates
- Dramatically improve performance in interactive editing scenarios
This makes Sipha perfect for building language servers, IDEs, and other tools that need to parse code as users type.
Additional Features
- Multiple parsing backends: Choose from LL(k), LR, and more (via feature flags)
- Immutable syntax trees: Green/red tree representation for efficient manipulation
- Error recovery: Configurable error recovery strategies for robust parsing
- Flexible grammar definition: Builder API for defining your grammar
- Unicode support: Full Unicode support for identifiers and text (optional)
- Rich diagnostics: Beautiful error messages with miette integration (optional)
- Tree traversal: Visitor patterns and query APIs for working with syntax trees
Installation
Add Sipha to your Cargo.toml:
[]
= "0.5.0"
Or with specific features:
[]
= { = "0.5.0", = ["diagnostics", "unicode", "backend-ll"] }
Quick Start
Let's build a simple arithmetic expression parser step by step. This example will help you understand the core concepts.
Step 1: Define Your Syntax Kinds
First, define the tokens and non-terminals your parser will use. Sipha uses a unified SyntaxKind trait for both terminals (tokens) and non-terminals (grammar rules):
use SyntaxKind;
Step 2: Build a Lexer
Create a lexer to tokenize your input text:
use ;
let lexer = new
.token
.token
.token
.token
.token
.token
.token
.token
.trivia
.build
.expect;
Step 3: Tokenize Input
let input = "42 + 10";
let tokens = lexer.tokenize
.expect;
Step 4: Define Non-Terminals and Build Grammar
use ;
// Build your grammar rules
let grammar = new
.entry_point
// Add your grammar rules here
.build
.expect;
Step 5: Parse!
use ;
use ParserBackend;
let config = default;
let mut parser = new
.expect;
let result = parser.parse;
For a complete working example, see examples/basic_arithmetic.rs.
Incremental Parsing
Incremental parsing is Sipha's core strength. It allows you to efficiently update your parse tree when code changes, rather than re-parsing everything from scratch.
Why Incremental Parsing?
In interactive applications like IDEs, users edit code frequently. Traditional parsers re-parse the entire file on every change, which can be slow for large files. Incremental parsing:
- Reuses unchanged nodes from previous parses
- Only re-parses affected regions
- Maintains parse caches for fast updates
- Scales to large codebases efficiently
How It Works
use ;
use ;
// Initial parse
let result1 = parser.parse_incremental;
// After an edit (e.g., user changed "42" to "100")
let edits = vec!;
// Incremental re-parse - only affected regions are re-parsed
let result2 = parser.parse_incremental;
The parser automatically:
- Identifies which parts of the tree are affected by the edit
- Reuses unchanged subtrees
- Only re-parses the minimal affected region
Current Status
Incremental parsing is fully implemented in version 0.5.0 with complete node reuse and cache management:
- Node reuse: Unchanged subtrees are automatically identified and reused from previous parses
- Cache population: Parse results are cached and can be reused for future parses
- Affected range computation: Only affected regions are re-parsed
- Smart cache invalidation: Cache entries are invalidated based on edit locations
- Cross-session caching: Persistent parse cache enables node reuse across multiple parse sessions
The parser automatically integrates reusable nodes from previous parses and queries the persistent cache during parsing, providing significant performance improvements for interactive editing scenarios.
GLR Parsing
Sipha includes a GLR (Generalized LR) parser backend for handling ambiguous grammars. GLR parsing extends LR parsing to handle non-deterministic grammars by maintaining multiple parser stacks and forking on conflicts.
When to Use GLR
Use the GLR backend when:
- Your grammar has inherent ambiguities (e.g., C++ template syntax)
- You need to handle multiple valid parse trees
- You want to disambiguate at parse time using precedence/associativity rules
Basic Usage
use ;
use ParserBackend;
// Create GLR parser
let config = default;
let parser = new
.expect;
// Parse with GLR - returns a parse forest for ambiguous results
let result = parser.parse;
// Handle ambiguity if present
if let Some = result.forest
Disambiguation
GLR parsers can produce parse forests when multiple valid parse trees exist. Sipha provides several disambiguation strategies:
- Precedence-based: Resolve conflicts using operator precedence
- Associativity-based: Resolve conflicts using operator associativity
- Custom strategies: Implement your own disambiguation logic
For more details, see the backend::glr module documentation.
Architecture Overview
Parsing Backends
Sipha supports multiple parsing algorithms via feature flags:
-
LL(k) Parser (
backend-ll): Top-down predictive parsing with configurable lookahead- Supports left-recursion elimination
- Configurable error recovery
- Incremental parsing support
-
LR Parser (
backend-lr): Bottom-up shift-reduce parsing- Efficient for many grammar types
- Good error recovery
-
GLR Parser (
backend-glr): Generalized LR parsing for ambiguous grammars- Handles non-deterministic and ambiguous grammars
- Parse forest representation for ambiguity tracking
- Configurable disambiguation strategies (precedence, associativity)
- Incremental parsing support
- Ideal for complex languages like C++ with inherent ambiguities
-
Planned: PEG, Packrat, Earley parsers
Syntax Trees
Sipha uses an immutable green/red tree representation:
- Green trees: Compact, shared representation stored in an arena
- Red trees: Convenient API for traversing and querying the tree
- Efficient memory usage: Shared subtrees reduce memory footprint
- Fast traversal: Optimized for common tree operations
Error Recovery
Sipha provides configurable error recovery strategies:
- Synchronization tokens: Skip to known recovery points
- Delimited recovery: Skip to matching delimiters
- Best-effort parsing: Continue parsing despite errors
Examples
The repository includes several examples to help you get started:
- Basic Arithmetic: A step-by-step arithmetic expression parser
- Simple Calculator: A more complete calculator with error handling
Run examples with:
API Overview
Core Modules
syntax: Syntax tree types (green/red trees, nodes, tokens)grammar: Grammar definition and validationlexer: Tokenization and lexingparser: Parser traits and interfacesbackend: Parser backend implementations (LL, LR, etc.)error: Error types and diagnosticsincremental: Incremental parsing support
Key Types
SyntaxKind: Trait for syntax node kindsToken: Trait for tokensNonTerminal: Trait for grammar non-terminalsGrammarBuilder: Build grammars declarativelyLexerBuilder: Build lexers with pattern matchingParserBackend: Unified interface for all parser backendsSyntaxNode: Red tree node for traversalGreenNode: Green tree node for storage
Performance
Incremental parsing provides significant performance benefits:
- Fast updates: Only re-parse changed regions
- Memory efficient: Shared tree representation
- Scalable: Handles large files efficiently
- IDE-ready: Designed for interactive editing scenarios
For batch parsing (non-interactive), Sipha is competitive with other Rust parsing libraries. The incremental parsing capabilities make it particularly well-suited for language servers and editors.
Comparison with Alternatives
| Feature | Sipha | pest | nom | lalrpop |
|---|---|---|---|---|
| Incremental parsing | β | β | β | β |
| Multiple backends | β | β | β | β |
| Syntax trees | β | β | β | β |
| Error recovery | β | β | β | β |
| Grammar DSL | β | β | β | β |
| Zero-copy | Partial | β | β | β |
When to use Sipha:
- Building language servers or IDEs
- Need incremental parsing for interactive editing
- Want flexibility in choosing parsing algorithms
- Need rich syntax tree manipulation
- Parsing ambiguous grammars (use GLR backend)
When to consider alternatives:
- Simple one-off parsers (pest or nom might be simpler)
- Maximum performance for batch parsing (nom might be faster)
- Prefer declarative grammar DSLs (pest or lalrpop)
Future Features
Sipha continues to evolve. Planned features for future releases include:
Short Term (Post-0.5.0)
- Enhanced error recovery: More sophisticated recovery strategies
- Better diagnostics: Improved error messages and suggestions
- Performance optimizations: Further speed improvements for large files
Medium Term
- Additional backends: PEG, Packrat, and Earley parsers
- Grammar visualization: Interactive tools for visualizing and debugging grammars
- Incremental lexing: Extend incremental capabilities to the lexer for even better performance
Long Term
- Language server framework: Higher-level abstractions for building language servers
- Parallel parsing: Parse multiple files in parallel
- Advanced grammar analysis: Real-time grammar optimization suggestions
Contributing
Contributions are welcome! Sipha 0.5.0 provides a solid foundation, and there are many opportunities to help:
- Bug reports: Found a bug? Please open an issue!
- Feature requests: Have an idea? We'd love to hear it!
- Code contributions: Pull requests are welcome
- Documentation: Help improve our docs and examples
- Testing: Help expand our test coverage
Development Setup
# Clone the repository
# Run tests
# Run examples
# Build documentation
License
This project is licensed under the MIT License - see the LICENSE file for details.
Documentation
- The Sipha Book: Comprehensive guide and documentation
- API Documentation: Full API reference on docs.rs
- Examples: Working examples in the repository
- Source Code: Browse the source on GitHub
Acknowledgments
Sipha is inspired by:
- rust-analyzer for incremental parsing ideas
- rowan for green/red tree design
- Various parsing libraries in the Rust ecosystem
Note: Sipha 0.5.0 provides a complete incremental parsing solution. The core API is stable, and we continue to add features and improvements based on user feedback.