sipha 0.5.0

Core parsing infrastructure for Sipha parser library
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
# Sipha

[![Crates.io](https://img.shields.io/crates/v/sipha.svg)](https://crates.io/crates/sipha)
[![docs.rs](https://docs.rs/sipha/badge.svg)](https://docs.rs/sipha)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A flexible, incremental parsing library for Rust with support for multiple parsing algorithms. Sipha is designed from the ground up to support **incremental parsing**β€”the ability to efficiently re-parse only the changed portions of your code, making it ideal for interactive applications like IDEs, editors, and language servers.

πŸ“š **[Read the Book](https://sipha-parser.github.io/sipha/)** - Comprehensive guide and documentation

## Status: 0.5.0 Release

Sipha 0.5.0 provides a stable foundation for incremental parsing with multiple backends. The core API is stable, though some advanced features may continue to evolve. We welcome feedback and contributions!

## Key Features

### Incremental Parsing (Primary Focus)

Sipha's standout feature is its **incremental parsing** capability. When you edit code, Sipha can:

- **Reuse unchanged subtrees** from previous parses
- **Reparse only affected regions** instead of the entire file
- **Maintain parse caches** for efficient updates
- **Dramatically improve performance** in interactive editing scenarios

This makes Sipha perfect for building language servers, IDEs, and other tools that need to parse code as users type.

### Additional Features

- **Multiple parsing backends**: Choose from LL(k), LR, and more (via feature flags)
- **Immutable syntax trees**: Green/red tree representation for efficient manipulation
- **Error recovery**: Configurable error recovery strategies for robust parsing
- **Flexible grammar definition**: Builder API for defining your grammar
- **Unicode support**: Full Unicode support for identifiers and text (optional)
- **Rich diagnostics**: Beautiful error messages with miette integration (optional)
- **Tree traversal**: Visitor patterns and query APIs for working with syntax trees

## Installation

Add Sipha to your `Cargo.toml`:

```toml
[dependencies]
sipha = "0.5.0"
```

Or with specific features:

```toml
[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics", "unicode", "backend-ll"] }
```

## Quick Start

Let's build a simple arithmetic expression parser step by step. This example will help you understand the core concepts.

### Step 1: Define Your Syntax Kinds

First, define the tokens and non-terminals your parser will use. Sipha uses a unified `SyntaxKind` trait for both terminals (tokens) and non-terminals (grammar rules):

```rust
use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    // Terminals (produced by lexer)
    Number,
    Plus,
    Minus,
    Multiply,
    Divide,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals (produced by parser)
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor)
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, ArithSyntaxKind::Whitespace)
    }
}
```

### Step 2: Build a Lexer

Create a lexer to tokenize your input text:

```rust
use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");
```

### Step 3: Tokenize Input

```rust
let input = "42 + 10";
let tokens = lexer.tokenize(input)
    .expect("Failed to tokenize input");
```

### Step 4: Define Non-Terminals and Build Grammar

```rust
use sipha::grammar::{GrammarBuilder, Token, NonTerminal, Expr};

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal {
    Expr,
    Term,
    Factor,
}

impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str {
        match self {
            ArithNonTerminal::Expr => "Expr",
            ArithNonTerminal::Term => "Term",
            ArithNonTerminal::Factor => "Factor",
        }
    }
}

// Build your grammar rules
let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    // Add your grammar rules here
    .build()
    .expect("Failed to build grammar");
```

### Step 5: Parse!

```rust
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, ArithNonTerminal::Expr);
```

For a complete working example, see [`examples/basic_arithmetic.rs`](crates/sipha/examples/basic_arithmetic.rs).

## Incremental Parsing

Incremental parsing is Sipha's core strength. It allows you to efficiently update your parse tree when code changes, rather than re-parsing everything from scratch.

### Why Incremental Parsing?

In interactive applications like IDEs, users edit code frequently. Traditional parsers re-parse the entire file on every change, which can be slow for large files. Incremental parsing:

- **Reuses unchanged nodes** from previous parses
- **Only re-parses affected regions** 
- **Maintains parse caches** for fast updates
- **Scales to large codebases** efficiently

### How It Works

```rust
use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::syntax::{TextRange, TextSize};

// Initial parse
let result1 = parser.parse_incremental(&tokens, None, &[], entry_point);

// After an edit (e.g., user changed "42" to "100")
let edits = vec![TextEdit {
    range: TextRange::new(
        TextSize::from(0),
        TextSize::from(2)
    ),
    new_text: "100".into(),
}];

// Incremental re-parse - only affected regions are re-parsed
let result2 = parser.parse_incremental(
    &new_tokens,
    Some(&result1.root),
    &edits,
    entry_point
);
```

The parser automatically:
1. Identifies which parts of the tree are affected by the edit
2. Reuses unchanged subtrees
3. Only re-parses the minimal affected region

### Current Status

Incremental parsing is fully implemented in version 0.5.0 with complete node reuse and cache management:

- [X] **Node reuse**: Unchanged subtrees are automatically identified and reused from previous parses
- [X] **Cache population**: Parse results are cached and can be reused for future parses
- [X] **Affected range computation**: Only affected regions are re-parsed
- [X] **Smart cache invalidation**: Cache entries are invalidated based on edit locations
- [X] **Cross-session caching**: Persistent parse cache enables node reuse across multiple parse sessions

The parser automatically integrates reusable nodes from previous parses and queries the persistent cache during parsing, providing significant performance improvements for interactive editing scenarios.

## GLR Parsing

Sipha includes a GLR (Generalized LR) parser backend for handling ambiguous grammars. GLR parsing extends LR parsing to handle non-deterministic grammars by maintaining multiple parser stacks and forking on conflicts.

### When to Use GLR

Use the GLR backend when:
- Your grammar has inherent ambiguities (e.g., C++ template syntax)
- You need to handle multiple valid parse trees
- You want to disambiguate at parse time using precedence/associativity rules

### Basic Usage

```rust
use sipha::backend::glr::{GlrParser, GlrConfig};
use sipha::backend::ParserBackend;

// Create GLR parser
let config = GlrConfig::default();
let parser = GlrParser::new(&grammar, config)
    .expect("Failed to create GLR parser");

// Parse with GLR - returns a parse forest for ambiguous results
let result = parser.parse(&tokens, entry_point);

// Handle ambiguity if present
if let Some(forest) = result.forest {
    // Multiple parse trees exist - disambiguate
    let disambiguated = forest.disambiguate(|alternatives| {
        // Custom disambiguation logic
        alternatives.first().cloned()
    });
}
```

### Disambiguation

GLR parsers can produce parse forests when multiple valid parse trees exist. Sipha provides several disambiguation strategies:

- **Precedence-based**: Resolve conflicts using operator precedence
- **Associativity-based**: Resolve conflicts using operator associativity
- **Custom strategies**: Implement your own disambiguation logic

For more details, see the [`backend::glr`](crates/sipha/src/backend/glr/mod.rs) module documentation.

## Architecture Overview

### Parsing Backends

Sipha supports multiple parsing algorithms via feature flags:

- **LL(k) Parser** (`backend-ll`): Top-down predictive parsing with configurable lookahead
  - Supports left-recursion elimination
  - Configurable error recovery
  - Incremental parsing support

- **LR Parser** (`backend-lr`): Bottom-up shift-reduce parsing
  - Efficient for many grammar types
  - Good error recovery

- **GLR Parser** (`backend-glr`): Generalized LR parsing for ambiguous grammars
  - Handles non-deterministic and ambiguous grammars
  - Parse forest representation for ambiguity tracking
  - Configurable disambiguation strategies (precedence, associativity)
  - Incremental parsing support
  - Ideal for complex languages like C++ with inherent ambiguities

- **Planned**: PEG, Packrat, Earley parsers

### Syntax Trees

Sipha uses an immutable **green/red tree** representation:

- **Green trees**: Compact, shared representation stored in an arena
- **Red trees**: Convenient API for traversing and querying the tree
- **Efficient memory usage**: Shared subtrees reduce memory footprint
- **Fast traversal**: Optimized for common tree operations

### Error Recovery

Sipha provides configurable error recovery strategies:

- **Synchronization tokens**: Skip to known recovery points
- **Delimited recovery**: Skip to matching delimiters
- **Best-effort parsing**: Continue parsing despite errors

## Examples

The repository includes several examples to help you get started:

- **[Basic Arithmetic]crates/sipha/examples/basic_arithmetic.rs**: A step-by-step arithmetic expression parser
- **[Simple Calculator]crates/sipha/examples/simple_calculator.rs**: A more complete calculator with error handling

Run examples with:

```bash
cargo run --example basic_arithmetic
cargo run --example simple_calculator
```

## API Overview

### Core Modules

- **`syntax`**: Syntax tree types (green/red trees, nodes, tokens)
- **`grammar`**: Grammar definition and validation
- **`lexer`**: Tokenization and lexing
- **`parser`**: Parser traits and interfaces
- **`backend`**: Parser backend implementations (LL, LR, etc.)
- **`error`**: Error types and diagnostics
- **`incremental`**: Incremental parsing support

### Key Types

- `SyntaxKind`: Trait for syntax node kinds
- `Token`: Trait for tokens
- `NonTerminal`: Trait for grammar non-terminals
- `GrammarBuilder`: Build grammars declaratively
- `LexerBuilder`: Build lexers with pattern matching
- `ParserBackend`: Unified interface for all parser backends
- `SyntaxNode`: Red tree node for traversal
- `GreenNode`: Green tree node for storage

## Performance

Incremental parsing provides significant performance benefits:

- **Fast updates**: Only re-parse changed regions
- **Memory efficient**: Shared tree representation
- **Scalable**: Handles large files efficiently
- **IDE-ready**: Designed for interactive editing scenarios

For batch parsing (non-interactive), Sipha is competitive with other Rust parsing libraries. The incremental parsing capabilities make it particularly well-suited for language servers and editors.

## Comparison with Alternatives

| Feature | Sipha | pest | nom | lalrpop |
|---------|-------|------|-----|--------|
| Incremental parsing | βœ… | ❌ | ❌ | ❌ |
| Multiple backends | βœ… | ❌ | ❌ | ❌ |
| Syntax trees | βœ… | βœ… | ❌ | βœ… |
| Error recovery | βœ… | βœ… | βœ… | βœ… |
| Grammar DSL | ❌ | βœ… | ❌ | βœ… |
| Zero-copy | Partial | βœ… | βœ… | ❌ |

**When to use Sipha:**
- Building language servers or IDEs
- Need incremental parsing for interactive editing
- Want flexibility in choosing parsing algorithms
- Need rich syntax tree manipulation
- Parsing ambiguous grammars (use GLR backend)

**When to consider alternatives:**
- Simple one-off parsers (pest or nom might be simpler)
- Maximum performance for batch parsing (nom might be faster)
- Prefer declarative grammar DSLs (pest or lalrpop)

## Future Features

Sipha continues to evolve. Planned features for future releases include:

### Short Term (Post-0.5.0)
- **Enhanced error recovery**: More sophisticated recovery strategies
- **Better diagnostics**: Improved error messages and suggestions
- **Performance optimizations**: Further speed improvements for large files

### Medium Term
- **Additional backends**: PEG, Packrat, and Earley parsers
- **Grammar visualization**: Interactive tools for visualizing and debugging grammars
- **Incremental lexing**: Extend incremental capabilities to the lexer for even better performance

### Long Term
- **Language server framework**: Higher-level abstractions for building language servers
- **Parallel parsing**: Parse multiple files in parallel
- **Advanced grammar analysis**: Real-time grammar optimization suggestions

## Contributing

Contributions are welcome! Sipha 0.5.0 provides a solid foundation, and there are many opportunities to help:

- **Bug reports**: Found a bug? Please open an issue!
- **Feature requests**: Have an idea? We'd love to hear it!
- **Code contributions**: Pull requests are welcome
- **Documentation**: Help improve our docs and examples
- **Testing**: Help expand our test coverage

### Development Setup

```bash
# Clone the repository
git clone https://github.com/yourusername/sipha.git
cd sipha

# Run tests
cargo test

# Run examples
cargo run --example basic_arithmetic

# Build documentation
cargo doc --open
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Documentation

- **[The Sipha Book]https://sipha-parser.github.io/sipha/**: Comprehensive guide and documentation
- **[API Documentation]https://docs.rs/sipha**: Full API reference on docs.rs
- **[Examples]crates/sipha/examples/**: Working examples in the repository
- **[Source Code]https://github.com/yourusername/sipha**: Browse the source on GitHub

## Acknowledgments

Sipha is inspired by:
- [rust-analyzer]https://github.com/rust-lang/rust-analyzer for incremental parsing ideas
- [rowan]https://github.com/rust-analyzer/rowan for green/red tree design
- Various parsing libraries in the Rust ecosystem

---

**Note**: Sipha 0.5.0 provides a complete incremental parsing solution. The core API is stable, and we continue to add features and improvements based on user feedback.