rustixml 0.3.1 - Docs.rs

═══════════════════════════════════════════════════════════════════════════
                    rustixml STRATEGY DECISION SUMMARY
═══════════════════════════════════════════════════════════════════════════

CURRENT STATUS: v0.2.0 - 83.7% conformance (41/49 correctness tests)

═══════════════════════════════════════════════════════════════════════════
OPTION 1: LALR+GLR (markup-blitz approach)
═══════════════════════════════════════════════════════════════════════════

What it means:
  • Precomputed tables: Grammar → State machine BEFORE parsing
  • LALR = Fast deterministic parsing (O(1) decisions)
  • GLR = Handles ambiguity by exploring parallel parse stacks
  
Analogy: Like compiling the grammar to machine code vs interpreting it

Pros:                          Cons:
  ✓ Best performance           ✗ 6-12 month rewrite
  ✓ Natural ambiguity          ✗ Complex implementation
  ✓ 98-100% conformance        ✗ No good Rust libraries
  ✓ Proven (markup-blitz)      ✗ Abstraction mismatch with iXML

═══════════════════════════════════════════════════════════════════════════
OPTION 2: Enhanced Native Parser (RECOMMENDED)
═══════════════════════════════════════════════════════════════════════════

Keep recursive descent, add optimizations:

PRE-PROCESSING (before parsing):
  1. Character class partitioning    [PARTIALLY DONE]
  2. Left-recursion transformation   [TODO]
  3. Nonterminal inlining            [TODO]

RUNTIME (during parsing):
  4. Memoization (packrat)           [TODO]
  5. Ambiguity detection             [TODO]
  6. Better error recovery           [PARTIAL]

POST-PROCESSING (after parsing):
  7. Tree normalization              [PARTIAL]

Timeline:
  v0.3.0 (1 month):  87-90% conformance
  v0.4.0 (3 months): 92-95% conformance
  v0.5.0 (4 months): 95%+ conformance

Pros:                          Cons:
  ✓ Incremental                ✗ May not reach 100%
  ✓ Lower risk                 ✗ Complexity accumulates
  ✓ Maintainable               ✗ Performance ceiling
  ✓ 4 month timeline

═══════════════════════════════════════════════════════════════════════════
OPTION 3: Hybrid Model
═══════════════════════════════════════════════════════════════════════════

Use different parsers for different grammar types:
  • Simple grammars → Fast recursive descent
  • Recursive grammars → Memoized parsing
  • Ambiguous grammars → Full path exploration

Pros: Best of both worlds    Cons: Complex, 5 months, 95-97% target

═══════════════════════════════════════════════════════════════════════════
OPTION 4: Profiling-Driven Only
═══════════════════════════════════════════════════════════════════════════

Measure hotspots, optimize what matters:
  cargo flamegraph --bin conformance_test
  
Pros: Evidence-based          Cons: Speed not correctness, 1 month

═══════════════════════════════════════════════════════════════════════════
RECOMMENDATION: Option 2 + 4 (Enhanced + Profiling)
═══════════════════════════════════════════════════════════════════════════

Why?
  1. Proven architecture (recursive descent works)
  2. Incremental improvement (ship v0.3, v0.4, v0.5)
  3. Lower risk than complete rewrite
  4. 90-95% is excellent for most users
  5. Simpler code = easier contributions

When to reconsider LALR+GLR?
  • If 100% conformance becomes critical
  • If mature Rust LALR+GLR library emerges
  • If building production parser framework
  • If performance becomes bottleneck (currently fine)

═══════════════════════════════════════════════════════════════════════════
DECISION: Ship v0.2.0 now, improve incrementally
═══════════════════════════════════════════════════════════════════════════

Current 83.7% conformance is:
  ✓ Great for v0.2.0 release
  ✓ Production-ready for common use cases
  ✓ Honest about limitations (KNOWN_ISSUES.md)
  ✓ Clear improvement path documented

Perfect is the enemy of good. Ship now, improve later.

═══════════════════════════════════════════════════════════════════════════