rustixml 0.3.1

Native iXML (Invisible XML) parser with left-recursion support - 76.9% spec conformance, works in Rust and WebAssembly
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
# rustixml Architecture

This document describes the architecture of rustixml v0.2.0, focusing on the **native recursive descent parser** implementation.

> **Historical Note**: rustixml originally used an Earley parser via the `earlgrey` crate (v0.1.0), but switched to a direct native implementation in v0.2.0 for better iXML semantic compatibility. See [`docs/CLAUDE_HISTORICAL.md`]docs/CLAUDE_HISTORICAL.md for the complete development history.

## Table of Contents

1. [Overview]#overview
2. [Core Components]#core-components
3. [Grammar Analysis & Optimization]#grammar-analysis--optimization
4. [Parse Flow]#parse-flow
5. [iXML Semantic Handling]#ixml-semantic-handling
6. [WebAssembly Support]#webassembly-support
7. [Testing]#testing

## Overview

### Architecture Principles

- **Direct Interpretation**: Parse iXML grammars directly without intermediate compilation
- **Zero-copy Where Possible**: Minimize string allocations during parsing
- **iXML-first Design**: Semantic operations (hiding, insertion, attributes) handled natively
- **Simple & Maintainable**: Recursive descent is easy to understand and debug

### Component Diagram

```
┌─────────────────────────────────────────────────────────────┐
│                      User Input                             │
│           (iXML Grammar String + Input Text)                │
└──────────────────┬──────────────────────────────────────────┘
       ┌───────────────────────────┐
       │   Grammar Parser          │
       │   (grammar_parser.rs)     │
       │                           │
       │  Lexer → Parser → AST     │
       └──────────┬────────────────┘
                  │ GrammarAst
       ┌───────────────────────────┐
       │   Grammar Analysis        │
       │   (grammar_analysis.rs)   │
       │                           │
       │  • Recursion detection    │
       │  • Ambiguity detection    │
       │  • Complexity scoring     │
       │  • Grammar normalization  │
       └──────────┬────────────────┘
                  │ GrammarAst + Analysis
       ┌───────────────────────────┐
       │   Native Parser           │
       │   (native_parser.rs)      │
       │                           │
       │  • Recursive descent      │
       │  • iXML semantics         │
       │  • Mark handling          │
       │  • Ambiguity marking      │
       └──────────┬────────────────┘
                  │ XmlNode tree
       ┌───────────────────────────┐
       │   XML Serialization       │
       │   (xml_node.rs)           │
       │                           │
       │  XmlNode → XML String     │
       └──────────┬────────────────┘
       ┌───────────────────────────┐
       │      XML Output           │
       │  (with ixml:state if      │
       │   grammar is ambiguous)   │
       └───────────────────────────┘
```

## Core Components

### 1. Grammar Parser (`src/grammar_parser.rs`)

**Responsibility**: Parse iXML grammar notation into an AST.

**Key Types**:
```rust
pub struct Grammar {
    pub rules: Vec<Rule>,
}

pub struct Rule {
    pub name: String,
    pub mark: Mark,           // @, ^, - or none
    pub alternatives: Vec<Alternative>,
}

pub struct Alternative {
    pub sequence: Vec<Factor>,
}

pub enum Factor {
    Nonterminal { name: String, mark: Mark },
    Terminal(String),
    CharClass(CharClass),
    Repetition { base: Box<Factor>, op: RepOp, sep: Option<Sequence> },
    Group(Alternatives),
    Optional(Sequence),
}

pub enum Mark {
    None,        // element with children
    Hidden,      // - (omit from output)
    Attribute,   // @ (make attribute)
    Promoted,    // ^ (insert symbol)
}
```

**Parser Algorithm**: Recursive descent
- Token stream from `Lexer`
- Lookahead(1) decisions
- Builds `GrammarAst` directly

### 2. Native Parser (`src/native_parser.rs`)

**Responsibility**: Parse input text according to a grammar AST, producing XML nodes.

**Key Structures**:
```rust
pub struct NativeParser {
    grammar: Grammar,
}

pub struct ParseContext {
    recursion_depth: usize,
    max_depth: usize,
    // Rule call tracking for left-recursion detection
}

pub struct ParseResult {
    node: Option<XmlNode>,
    consumed: usize,
}
```

**Core Algorithm**: Recursive descent with backtracking

```rust
impl NativeParser {
    // Entry point
    pub fn parse(&self, input: &str) -> Result<String, String> {
        let stream = InputStream::new(input);
        let ctx = ParseContext::new();
        
        // Find start rule (first rule in grammar)
        let start_rule = &self.grammar.rules[0];
        
        // Parse starting from start rule
        let result = self.parse_rule(&stream, start_rule, &mut ctx)?;
        
        // Convert to XML string
        Ok(result.node.to_xml())
    }
    
    // Recursive descent functions
    fn parse_rule(&self, stream, rule, ctx) -> Result<ParseResult>
    fn parse_alternative(&self, stream, alt, ctx) -> Result<ParseResult>
    fn parse_factor(&self, stream, factor, ctx) -> Result<ParseResult>
    fn parse_nonterminal(&self, stream, name, ctx) -> Result<ParseResult>
    fn parse_terminal(&self, stream, literal, ctx) -> Result<ParseResult>
    fn parse_charclass(&self, stream, charclass, ctx) -> Result<ParseResult>
    fn parse_repetition(&self, stream, base, op, sep, ctx) -> Result<ParseResult>
}
```

**Backtracking**: When an alternative fails, restore input position and try next alternative.

**Performance**: O(n) for deterministic grammars, O(n²) for ambiguous (explores alternatives).

### 3. AST & XML Nodes (`src/ast.rs`)

**Responsibility**: Represent parsed results as XML tree.

```rust
pub enum XmlNode {
    Element {
        name: String,
        attributes: Vec<(String, String)>,
        children: Vec<XmlNode>,
    },
    Text(String),
    Attribute {
        name: String,
        value: String,
    },
}

impl XmlNode {
    pub fn to_xml(&self) -> String {
        // Serialize to XML string
    }
    
    pub fn text_content(&self) -> String {
        // Extract text for @ marks
    }
}
```

### 4. Character Classes (`src/charclass.rs`)

**Responsibility**: Handle iXML character class syntax `[...]`.

**Features**:
- Character ranges: `[a-z]`, `[A-Z]`, `[0-9]`
- Unicode general categories: `[L]` (letters), `[N]` (numbers), etc.
- Hex codes: `[#20-#7E]` (ASCII printable)
- Exclusions: `~[","]` (everything except comma)
- Subtraction: `[L - Lu]` (letters except uppercase)

**Implementation**:
```rust
pub struct RangeSet {
    ranges: Vec<(char, char)>,  // Sorted, non-overlapping
}

impl RangeSet {
    pub fn contains(&self, ch: char) -> bool {
        // Binary search through ranges
    }
    
    pub fn from_unicode_category(cat: &str) -> Self {
        // Use unicode-general-category crate
    }
}
```

### 5. Grammar Analysis (`src/grammar_analysis.rs`)

**Responsibility**: Analyze grammar structure to detect problematic patterns and enable optimizations.

**Key Features**:
- **Recursion Detection**: Identifies directly and indirectly recursive rules
- **Ambiguity Detection**: Detects grammars that can produce multiple parse trees
- **Complexity Scoring**: Calculates grammar complexity for performance warnings
- **Rule Categorization**: Identifies hidden, promoted, and attribute rules

**Key Structures**:
```rust
pub struct GrammarAnalysis {
    pub recursive_rules: HashSet<String>,
    pub left_recursive_rules: HashSet<String>,
    pub hidden_rules: HashSet<String>,
    pub promoted_rules: HashSet<String>,
    pub attribute_rules: HashSet<String>,
    pub complexity_scores: HashMap<String, usize>,
    pub is_potentially_ambiguous: bool,
}

impl GrammarAnalysis {
    pub fn analyze(grammar: &IxmlGrammar) -> Self {
        // Perform static analysis on grammar structure
    }

    pub fn report(&self) -> String {
        // Generate human-readable analysis report
    }
}
```

**Ambiguity Detection Patterns**:

The analyzer detects common ambiguity patterns using conservative heuristics:

1. **Multiple Nullable Alternatives**: Rules where multiple alternatives can match empty input
   ```ixml
   a: "a"* ; "b"*.  // Both alternatives match empty string
   ```

2. **Shared Nullable Nonterminals**: Alternatives starting with the same nullable rule
   ```ixml
   a: spaces, "a" ; spaces, "b".
   spaces: " "*.
   ```

**Fixpoint Nullable Detection**:

Uses iterative fixpoint algorithm to compute which rules can match empty input:

```rust
fn compute_nullable_set(rule_map: &HashMap<String, &Rule>) -> HashSet<String> {
    let mut nullable = HashSet::new();
    let mut changed = true;

    // Iterate until no more changes (fixpoint)
    while changed {
        changed = false;
        for (name, rule) in rule_map {
            if !nullable.contains(name) && is_rule_nullable(rule, &nullable) {
                nullable.insert(name.clone());
                changed = true;
            }
        }
    }

    nullable
}
```

**Integration with Parser**:

When a grammar is flagged as potentially ambiguous, the parser automatically:
1. Adds `xmlns:ixml="http://invisiblexml.org/NS"` namespace
2. Adds `ixml:state='ambiguous'` attribute to root element
3. Continues normal parsing (picks one parse tree)

### 6. WebAssembly Bindings (`src/wasm.rs`)

**Responsibility**: Export Rust functions to JavaScript/WebAssembly.

```rust
#[wasm_bindgen]
pub fn parse_ixml(grammar: &str, input: &str) -> JsValue {
    let result = match parse_ixml_grammar(grammar) {
        Ok(ast) => {
            let parser = NativeParser::new(ast);
            match parser.parse(input) {
                Ok(xml) => ParseResult::success(xml),
                Err(e) => ParseResult::error(e),
            }
        }
        Err(e) => ParseResult::error(e),
    };
    
    serde_wasm_bindgen::to_value(&result).unwrap()
}

#[wasm_bindgen]
pub struct IxmlParser {
    parser: NativeParser,
}

#[wasm_bindgen]
impl IxmlParser {
    #[wasm_bindgen(constructor)]
    pub fn new(grammar: &str) -> Result<IxmlParser, JsValue> {
        // Create reusable parser
    }
    
    pub fn parse(&self, input: &str) -> JsValue {
        // Parse with pre-compiled grammar
    }
}
```

## Grammar Analysis & Optimization

This section describes the static analysis performed on grammars before parsing begins.

### Analysis Pipeline

```mermaid
graph TB
    A[Grammar AST] --> B[Build Rule Map]
    B --> C[Detect Recursion]
    B --> D[Compute Nullable Set]
    B --> E[Calculate Complexity]
    D --> F[Detect Ambiguity]
    C --> G[Categorize Rules]
    E --> G
    F --> H[GrammarAnalysis Result]
    G --> H
    H --> I[Parser with Analysis]

    style A fill:#e1f5ff,stroke:#333,color:#000
    style H fill:#ffe1e1,stroke:#333,color:#000
    style I fill:#e1ffe1,stroke:#333,color:#000
    style B fill:#fff9e1,stroke:#333,color:#000
    style C fill:#fff9e1,stroke:#333,color:#000
    style D fill:#fff9e1,stroke:#333,color:#000
    style E fill:#fff9e1,stroke:#333,color:#000
    style F fill:#ffe1f5,stroke:#333,color:#000
    style G fill:#fff9e1,stroke:#333,color:#000
```

### Ambiguity Detection Flow

When the parser is created with a grammar, analysis happens automatically:

```mermaid
graph LR
    A[Parse Grammar Text] --> B[Grammar AST]
    B --> C[GrammarAnalysis.analyze]
    C --> D{Has Multiple<br/>Nullable Alts?}
    C --> E{Has Shared<br/>Nullable Nonterms?}
    D -->|Yes| F[Set is_potentially_ambiguous = true]
    E -->|Yes| F
    D -->|No| G[is_potentially_ambiguous = false]
    E -->|No| G
    F --> H[NativeParser with Analysis]
    G --> H
    H --> I[Parse Input]
    I --> J{Grammar<br/>Ambiguous?}
    J -->|Yes| K[Add ixml:state attribute]
    J -->|No| L[Return XmlNode as-is]
    K --> M[Serialize to XML]
    L --> M

    style A fill:#e1f5ff,stroke:#333,color:#000
    style B fill:#e1f5ff,stroke:#333,color:#000
    style C fill:#fff9e1,stroke:#333,color:#000
    style F fill:#ffe1e1,stroke:#333,color:#000
    style G fill:#e1ffe1,stroke:#333,color:#000
    style H fill:#fff9e1,stroke:#333,color:#000
    style K fill:#ffe1f5,stroke:#333,color:#000
    style M fill:#e1ffe1,stroke:#333,color:#000
```

### Nullable Fixpoint Algorithm

The nullable detection uses a fixpoint iteration algorithm:

```mermaid
graph TD
    A[Start: nullable = empty set] --> B[Loop: changed = true]
    B --> C{For each rule}
    C -->|Next rule| D{Already nullable?}
    D -->|Yes| C
    D -->|No| E{All alts check nullable?}
    E --> F{Any alt is nullable?}
    F -->|Yes| G[Add rule to nullable set]
    F -->|No| C
    G --> H[Set changed = true]
    H --> C
    C -->|Done| I{changed == true?}
    I -->|Yes| B
    I -->|No| J[Return nullable set]

    style A fill:#e1f5ff,stroke:#333,color:#000
    style B fill:#fff9e1,stroke:#333,color:#000
    style G fill:#ffe1f5,stroke:#333,color:#000
    style J fill:#e1ffe1,stroke:#333,color:#000
```

**Alternative Nullable Check**:
- All factors in sequence must be nullable
- Factor is nullable if:
  - It's a nonterminal in the nullable set
  - It's a repetition with `*` or `**` operator (zero or more)
  - It's an optional `?` group
  - It's a group where at least one alternative is nullable

### Static vs Runtime Analysis

rustixml uses **static analysis** to detect ambiguity:

| Approach | When | What | Pros | Cons |
|----------|------|------|------|------|
| **Static** (rustixml) | Before parsing | Analyze grammar structure | Fast, predictable | Conservative (may miss some, may have false positives) |
| **Runtime** | During parsing | Track all parse trees | Precise, complete | Slow, memory-intensive |

**Design Decision**: Static analysis enables:
- O(1) ambiguity check at parse time
- Deterministic parser behavior (always picks one tree)
- Clear user warnings about grammar issues
- Foundation for future optimizations (memoization, transformation)

### Grammar Normalization

Grammar normalization transforms grammars to enable better analysis and left-recursion handling:

**Implemented transformations**:
1. **Hidden/Promoted rule inlining**: Inline `-` and `^` marked rules for analysis
2. **Seed-growing left-recursion**: Handle left-recursive grammars using iterative seed-growing algorithm
3. **Nullable analysis**: Fixpoint iteration to compute nullable rules for ambiguity detection

**Implementation**:
- Normalization framework in `src/normalize.rs`
- Used for grammar analysis and ambiguity detection
- Seed-growing algorithm documented in `docs/SEED_GROWING_IMPLEMENTATION.md`

**Future enhancements**:
- Nullable factoring: Extract common nullable prefixes
- Character class partitioning: Split overlapping ranges for faster matching

## Parse Flow

### Step-by-Step Example

**Grammar**:
```ixml
greeting: "Hello, ", name, "!".
name: letter+.
letter: ["A"-"Z"; "a"-"z"].
```

**Input**: `"Hello, World!"`

**Parse Steps**:

1. **Start**: Call `parse_rule` on `greeting` rule
   
2. **Parse Alternative**: `"Hello, ", name, "!"`
   - **Parse Terminal** `"Hello, "`: Match succeeds, consume 7 chars
   - **Parse Nonterminal** `name`:
     - Call `parse_rule` on `name` rule
     - **Parse Factor** `letter+`: Repetition (one or more)
       - **Parse CharClass** `["A"-"Z"; "a"-"z"]`: Match 'W', consume 1
       - Repeat for 'o', 'r', 'l', 'd' → 5 chars total
     - Return `<name>World</name>` node, consumed 5
   - **Parse Terminal** `"!"`: Match succeeds, consume 1 char
   
3. **Build Result**:
   - Combine: Text("Hello, ") + Element(name, "World") + Text("!")
   - Wrap in `<greeting>` element
   - Return `<greeting>Hello, <name>World</name>!</greeting>`

**Total consumed**: 13 characters (entire input)

### Backtracking Example

**Grammar**:
```ixml
number: hex | decimal.
hex: "0x", digit+.
decimal: digit+.
digit: ["0"-"9"].
```

**Input**: `"123"`

**Parse Steps**:

1. Try first alternative (`hex`):
   - Try to match `"0x"`**Fails** at position 0
   - **Backtrack**: Restore position to 0
   
2. Try second alternative (`decimal`):
   - Match `digit+` → Succeeds, consume all 3 chars
   - Return `<decimal>123</decimal>`

## iXML Semantic Handling

### Marks (Operators)

iXML has 4 mark types that control XML generation:

| Mark | Symbol | Meaning | Example |
|------|--------|---------|---------|
| **None** | (default) | Wrap in element | `name: letter+.``<name>abc</name>` |
| **Hidden** | `-` | Omit from output | `-ws: " "+.` → (invisible) |
| **Attribute** | `@` | Make attribute | `@id: digit+.``id="123"` |
| **Promoted** | `^` | Insert symbol | `^comma: ",".``<comma/>` |

### Implementation

**Rule-level Marks** (applied to entire rule):
```rust
fn apply_rule_mark(&self, result: ParseResult, rule: &Rule) -> ParseResult {
    match rule.mark {
        Mark::None => {
            // Wrap in element with rule name
            Element { name: rule.name, children: result.children }
        }
        Mark::Hidden => {
            // Pass through children without wrapping
            result.children
        }
        Mark::Attribute => {
            // Should not happen at rule level (grammar error)
            error!("Rule-level @ not allowed")
        }
        Mark::Promoted => {
            // Force wrapping even if factor says otherwise
            Element { name: rule.name, children: result.children }
        }
    }
}
```

**Factor-level Marks** (applied to individual factors):
```rust
fn apply_factor_mark(&self, result: ParseResult, factor: &Factor) -> ParseResult {
    match factor.mark {
        Mark::Hidden => {
            // Unwrap element, pass through children
            result.children
        }
        Mark::Attribute => {
            // Convert to attribute node
            Attribute { name: factor.name, value: result.text_content() }
        }
        Mark::Promoted => {
            // Insert empty element with name
            Element { name: factor.name, attributes: [], children: [] }
        }
        Mark::None => result,
    }
}
```

### Repetition Operators

| Operator | Syntax | Meaning | Example |
|----------|--------|---------|---------|
| **Zero or more** | `*` | Optional repetition | `digit*` |
| **One or more** | `+` | Required repetition | `letter+` |
| **Zero or more with sep** | `**` | List with separator | `number++","` |
| **One or more with sep** | `++` | Non-empty list | `item++";"` |

**Implementation** (simplified):
```rust
fn parse_repetition_plus(&mut self, base: &Factor) -> ParseResult {
    let mut children = vec![];
    let mut consumed = 0;
    
    // Must match at least once
    let first = self.parse_factor(base)?;
    children.push(first.node);
    consumed += first.consumed;
    
    // Keep matching while possible
    loop {
        let pos = self.stream.position();
        match self.parse_factor(base) {
            Ok(result) => {
                children.push(result.node);
                consumed += result.consumed;
            }
            Err(_) => {
                // No more matches, restore position and break
                self.stream.set_position(pos);
                break;
            }
        }
    }
    
    Ok(ParseResult { node: merge(children), consumed })
}
```

## WebAssembly Support

### Build Process

1. **Compile to WASM**:
   ```bash
   wasm-pack build --target web --out-dir pkg
   ```

2. **Output Files**:
   - `pkg/rustixml.js` - JavaScript glue code
   - `pkg/rustixml_bg.wasm` - WebAssembly binary (156KB, 50KB gzipped)
   - `pkg/rustixml.d.ts` - TypeScript definitions

### Browser Usage

```html
<script type="module">
import init, { parse_ixml, IxmlParser } from './pkg/rustixml.js';

await init();

// One-shot parsing
const result = parse_ixml(grammar, input);

// Reusable parser
const parser = new IxmlParser(grammar);
const result1 = parser.parse(input1);
const result2 = parser.parse(input2);
</script>
```

### WASMZ Pattern

rustixml implements the **WASMZ pattern** (WebAssembly + htmz + wasm:// routing):

- **Template-Returning Functions**: WASM functions return HTML strings, not JSON
- **No Backend Required**: All processing in browser, no server roundtrips
- **~10x Performance**: Native WASM speed vs JavaScript

See [`www/wasmz.html`](www/wasmz.html) and [`www/WASMZ-PATTERN.md`](www/WASMZ-PATTERN.md) for details.

## Testing

### Test Suite Structure

```
tests/
  └── integration_test.rs    - Integration tests

ixml_tests/
  ├── correct/               - Correctness tests (49 tests, 83.7% pass)
  ├── ambiguous/             - Ambiguity detection (13 tests, 15.4% pass)
  ├── error/                 - Error handling (3 tests, 66.7% pass)
  └── ...                    - Other categories

src/bin/
  └── native_conformance_runner.rs  - Test runner
```

### Running Tests

```bash
# Unit tests
cargo test

# Integration tests
cargo test --test integration_test

# Conformance tests
cargo run --bin conformance_test

# Specific test category
cargo run --bin conformance_test -- correct/
```

### Test Results (v0.2.0)

- **Overall**: 49/65 tests passing (75.4%)
- **Correctness**: Strong coverage of core functionality ✅
- **Ambiguous**: Static detection implemented, some tests require runtime tracking
- **Error**: Good error handling coverage

**Notable Features**:
- Grammar analysis enabled with recursion and ambiguity detection
- Fixpoint nullable detection handles complex mutual recursion
- Static ambiguity detection adds `ixml:state='ambiguous'` automatically
- No regressions from analysis integration

See [`KNOWN_ISSUES.md`](KNOWN_ISSUES.md) for details on failing tests.

## Performance Characteristics

### Time Complexity

- **Deterministic grammars**: O(n) where n = input length
- **Ambiguous grammars**: O(n²) or worse (explores all alternatives)
- **Pathological cases**: O(n³) with deep backtracking

### Space Complexity

- **Parse tree**: O(n) for tree nodes
- **Call stack**: O(d) where d = grammar depth
- **No memoization**: No caching of parse results (could add in v0.3)

### Typical Performance

- Simple grammars: < 1ms for small inputs
- Medium grammars: 1-10ms for typical inputs
- Complex grammars: 10-100ms (still acceptable)

## Future Improvements

See [`docs/STRATEGY_OPTIONS.md`](docs/STRATEGY_OPTIONS.md) for detailed analysis.

**Completed** (v0.2.0):
- ✅ Grammar analysis framework
- ✅ Recursion detection (direct and indirect)
- ✅ Static ambiguity detection with automatic marking
- ✅ Fixpoint nullable detection
- ✅ Complexity scoring
- ✅ Grammar normalization (Pemberton's approach)
- ✅ Seed-growing left-recursion support
- ✅ Hidden/promoted rule inlining for analysis

**Short term** (v0.3):
- Enable character class partitioning
- Add basic memoization (packrat parsing)
- Improve ambiguity detection patterns (reduce false negatives)

**Medium term** (v0.4):
- Nullable factoring optimization
- Runtime ambiguity tracking (for 100% conformance)
- Advanced left-recursion optimizations

**Long term** (v1.0):
- Consider LALR+GLR for 100% conformance
- Advanced optimizations using analysis
- Full iXML 1.0 spec support

## References

- [iXML Specification]https://invisiblexml.org/1.0/
- [KNOWN_ISSUES.md]KNOWN_ISSUES.md - Current limitations
- [STRATEGY_OPTIONS.md]docs/STRATEGY_OPTIONS.md - Improvement strategies
- [CLAUDE_HISTORICAL.md]docs/CLAUDE_HISTORICAL.md - Complete development history
- [GRAMMAR_ANALYSIS_STATUS.md]docs/GRAMMAR_ANALYSIS_STATUS.md - Grammar analysis implementation details
- [SEED_GROWING_IMPLEMENTATION.md]docs/SEED_GROWING_IMPLEMENTATION.md - Seed-growing left-recursion algorithm
- [NORMALIZATION_LESSONS.md]docs/NORMALIZATION_LESSONS.md - Lessons learned from normalization attempts

---

*For detailed implementation notes and historical context, see the docs/ directory.*