sysml-v2-parser 0.23.0

SysML v2 textual notation parser for Rust
Documentation
# Error Recovery in the SysML Parser

This document describes the current recovery architecture after the language-server-oriented recovery improvements.

## Current Model

The parser now uses a **hybrid recovery strategy**:

1. **Grammar-level recovery inside body parsers**
2. **Top-level recovery in `parse_with_diagnostics()`**
3. **Explicit AST error nodes for recovered regions**

This is a meaningful step toward language-server use because malformed regions are no longer only "skipped"; they can now remain visible in the AST.

## Core Behavior

### 1. Grammar-level recovery

Several body parsers recover locally instead of immediately aborting their enclosing construct:

- package bodies
- part definition bodies
- part usage bodies
- requirement bodies
- action definition bodies
- action usage bodies
- state bodies
- use case bodies

These parsers now:

1. try to parse the next known body element
2. if parsing fails at a plausible body-element starter, recover locally
3. skip the malformed statement or block
4. resynchronize to the next likely body element or closing `}`
5. insert an AST error node covering the skipped region

Relevant files:

- [`src/parser/package.rs`]C:\Git\sysml-v2-parser\src\parser\package.rs
- [`src/parser/part.rs`]C:\Git\sysml-v2-parser\src\parser\part.rs
- [`src/parser/requirement.rs`]C:\Git\sysml-v2-parser\src\parser\requirement.rs
- [`src/parser/action.rs`]C:\Git\sysml-v2-parser\src\parser\action.rs
- [`src/parser/state.rs`]C:\Git\sysml-v2-parser\src\parser\state.rs
- [`src/parser/usecase.rs`]C:\Git\sysml-v2-parser\src\parser\usecase.rs

### 2. Shared sync-point helpers

Recovery now relies on shared helpers in [`src/parser/lex.rs`](C:\Git\sysml-v2-parser\src\parser\lex.rs):

- `skip_statement_or_block()`
- `skip_to_next_body_element_or_end()`
- `recover_body_element()`
- shared starter tables such as:
  - `PACKAGE_BODY_STARTERS`
  - `PART_BODY_STARTERS`
  - `REQUIREMENT_BODY_STARTERS`
  - `ACTION_BODY_STARTERS`
  - `STATE_BODY_STARTERS`
  - `USE_CASE_BODY_STARTERS`

This makes recovery behavior more consistent across grammar scopes.

### 3. AST error nodes

Recovered regions are represented with `ParseErrorNode` in [`src/ast.rs`](C:\Git\sysml-v2-parser\src\ast.rs).

Error-node variants currently exist in:

- `PackageBodyElement`
- `PartDefBodyElement`
- `PartUsageBodyElement`
- `RequirementDefBodyElement`
- `ActionDefBodyElement`
- `ActionUsageBodyElement`
- `StateDefBodyElement`
- `UseCaseDefBodyElement`

Each `ParseErrorNode` currently stores:

- `message`
- `code`
- `expected`
- `found`
- `suggestion`

### 4. Diagnostics from the AST

`parse_with_diagnostics()` in [`src/parser/mod.rs`](C:\Git\sysml-v2-parser\src\parser\mod.rs) now does two things:

1. collects top-level parse failures using the outer recovery loop
2. traverses the resulting AST and turns recovery nodes into `ParseError` diagnostics

This is important because many syntax problems are now captured locally inside the grammar rather than surfacing only as top-level failures.

## Top-level Recovery

Top-level recovery still exists for cases where parsing cannot even build the surrounding root/package structure.

The outer loop in `parse_with_diagnostics()`:

1. parses one root element at a time
2. on error, records a diagnostic conditionally
3. skips to the next sync point
4. continues parsing

This is still useful, but it is no longer the only or primary recovery mechanism for nested content.

## Current Strengths

- recovery is closer to the grammar than before
- malformed nested content can remain visible in the AST
- diagnostics can now originate from recovered body regions
- shared sync helpers reduce module-specific recovery drift
- recovery loops explicitly guard against zero progress

## Current Limitations

### 1. No central diagnostic state threaded through parsing

Diagnostics are still collected after parsing or at top level, not directly emitted from every parser via shared state.

That means the parser is more resilient, but not yet fully built around a "parsing never fails" architecture.

### 2. Error nodes are not yet everywhere

The most important body scopes are covered, but not every grammar scope or nested construct has explicit error nodes yet.

### 3. Some diagnostics remain generic

Recovery-node diagnostics now have more specific codes than before, but many parser errors still use generic `nom`-derived messages such as:

- `expected keyword or token`
- `parse error`

### 4. Recovery still depends on starter tables

Starter tables are much better than snippet heuristics, but they are still manually curated. They must evolve with grammar coverage.

## Invariants

The parser should preserve these recovery invariants:

- every recovery step must either consume input or stop
- malformed nested content should not poison later siblings unnecessarily
- recovered regions should be visible in the AST where practical
- `parse_with_diagnostics()` should produce partial AST + diagnostics together
- invalid surveillance fixtures such as `test {}` must still be reported as real errors
- parser entrypoints (`parse`, `parse_with_diagnostics`) must never panic on user input; malformed input must be surfaced as `Err` and/or diagnostics

## Tests

Recovery behavior is currently exercised by:

- surveillance invalid-fixture tests in [`tests/validation/surveillance_drone.rs`]C:\Git\sysml-v2-parser\tests\validation\surveillance_drone.rs
- parser recovery tests in [`tests/parser_tests.rs`]C:\Git\sysml-v2-parser\tests\parser_tests.rs
- the full validation suite

The newer parser tests explicitly verify:

- sibling recovery after malformed members
- AST error node insertion
- local diagnostics generated from recovery nodes
- panic-safety over malformed corpora and generated arbitrary text

Enforcement points:

- compile-time lint policy in [`src/lib.rs`]C:\Git\sysml-v2-parser\src\lib.rs denies `unwrap`, `expect`, and `panic!` in non-test code paths
- panic-safety integration tests in [`tests/parser_panic_safety.rs`]C:\Git\sysml-v2-parser\tests\parser_panic_safety.rs

## Recommended Next Steps

### Short term

- add AST error nodes to more grammar scopes as needed
- make construct-specific diagnostics more precise
- remove or narrow remaining generic top-level heuristics where local recovery already covers the case

### Medium term

- move toward parser-state-based diagnostic accumulation instead of post-pass extraction
- make error codes and suggestions more systematic
- consider dedicated error-node variants for especially important grammar categories

### Long term

- evaluate a more explicit resilient-parser architecture, such as `expect()`-style combinators or richer error trees

## Summary

The parser is now substantially closer to being language-server-usable than the earlier top-level-only recovery model:

- local recovery exists in important grammar scopes
- recovered regions can survive as AST nodes
- diagnostics can be derived from those recovered regions

It is not yet a fully resilient parser architecture, but it now has the right structural pieces to keep improving in that direction.