antlr-rust-runtime 0.1.2

Clean-room Rust runtime and target support for ANTLR v4 generated parsers
Documentation
# ANTLR4 Runtime for Rust

`antlr-rust-runtime` is a pure Rust runtime and metadata generator for ANTLR v4
lexers and parsers. It is a clean-room implementation written from scratch from
the public ANTLR runtime contract; it does not vendor or fork an older Rust
ANTLR runtime.

## First Steps

### 1. Install ANTLR4

Follow the ANTLR getting-started guide and install the ANTLR tool jar. The
runtime tests currently validate against ANTLR `4.13.2`.

### 2. Install the Rust ANTLR runtime tools

Each ANTLR target language needs a runtime package used by generated parsers.
For Rust projects, add the runtime crate:

```toml
[dependencies]
antlr-rust-runtime = "0.1"
```

The library crate is imported as `antlr4_runtime`:

```rust
use antlr4_runtime::{CommonTokenStream, InputStream};
```

Install the companion generator binary:

```bash
cargo install antlr-rust-runtime
```

This installs `antlr4-rust-gen`, which turns ANTLR `.interp` metadata into Rust
lexer and parser modules.

### 3. Generate your parser

The current release uses a metadata-first generation path:

1. run the official ANTLR tool to produce `.interp` files,
2. run `antlr4-rust-gen` to emit Rust modules,
3. compile those modules against `antlr4_runtime`.

For a split lexer/parser grammar:

```bash
antlr4 MyGrammarLexer.g4 MyGrammarParser.g4

antlr4-rust-gen \
  --lexer MyGrammarLexer.interp \
  --parser MyGrammarParser.interp \
  --out-dir src/generated
```

The checked-in ANTLR `RustTarget`/StringTemplate shell is kept in `tool/` and
will be expanded around the same runtime contracts.

### Alternative: Generate metadata with antlr-ng

[`antlr-ng`](https://www.antlr-ng.org/introduction.html) is a TypeScript/npm
parser generator based on ANTLR 4.13.2. It does not currently ship a Rust
target, but it can produce the same `.interp` metadata that `antlr4-rust-gen`
uses.

Install it with npm or run it through `npx`:

```bash
npx antlr-ng -Dlanguage=Java -o build/antlr --exact-output-dir true JSON.g4
```

The `-Dlanguage=Java` option selects one of antlr-ng's bundled code-generation
targets only so the tool emits grammar artifacts, including `JSONLexer.interp`
and `JSON.interp`. The Java files can be ignored; Rust code still comes from
`antlr4-rust-gen`:

```bash
antlr4-rust-gen \
  --lexer build/antlr/JSONLexer.interp \
  --parser build/antlr/JSON.interp \
  --out-dir src/generated
```

For local tooling, antlr-ng requires Node.js 20 or newer. See the
[antlr-ng getting-started guide](https://www.antlr-ng.org/getting-started.html)
for CLI installation and option details.

## Complete Example

Suppose you are using the JSON grammar from `antlr/grammars-v4/json`.

Fetch or copy `JSON.g4`, then generate ANTLR metadata:

```bash
antlr4 JSON.g4
```

Generate Rust modules:

```bash
antlr4-rust-gen \
  --lexer JSONLexer.interp \
  --parser JSON.interp \
  --out-dir src/generated
```

Declare the generated modules in your crate:

```rust
mod generated {
    #![allow(dead_code)]

    pub mod json;
    pub mod json_lexer;
}
```

Call the generated lexer and parser:

```rust
use antlr4_runtime::{CommonTokenStream, InputStream};
use generated::json::Json;
use generated::json_lexer::JsonLexer;

fn main() -> Result<(), antlr4_runtime::AntlrError> {
    let lexer = JsonLexer::new(InputStream::new(r#"{"a":1}"#));
    let tokens = CommonTokenStream::new(lexer);
    let mut parser = Json::new(tokens);
    let tree = parser.json()?;

    println!("{}", tree.text());
    Ok(())
}
```

## Technical Notes

- Pure Rust runtime implementation.
- Written from scratch as a clean-room implementation.
- Supports ANTLR serialized ATN deserialization.
- Supports lexer and parser execution through generated Rust wrappers.
- Supports real split lexer/parser grammars, including Kotlin smoke builds.
- Passes every upstream ANTLR runtime-testsuite descriptor discovered by the
  harness: `357 passed, 0 failed, 0 skipped, 357 run`.
- Licensed under BSD-3-Clause for compatibility with ANTLR's runtime licensing
  pattern and downstream open-source applications.

The runtime contains:

- `IntStream` and `CharStream`
- UTF-8 input as Unicode scalar values
- `Token`, `CommonToken`, token factories, and `TokenSource`
- buffered, channel-aware `CommonTokenStream`
- `Vocabulary`
- recognizer metadata and error listener plumbing
- parse tree node types, rule contexts, terminal nodes, error nodes, and walkers
- ANTLR v4 serialized ATN deserialization
- lexer ATN recognition with longest-match/rule-priority behavior and lexer
  actions
- parser ATN rule recognition with backtracking over token stream indices
- `antlr4-rust-gen`, a Rust generator that consumes ANTLR `.interp` metadata and
  emits Rust modules
- `antlr4-runtime-testsuite`, a harness for running upstream ANTLR
  runtime-test descriptors through the Rust metadata path

See [docs/kotlin-build.md](docs/kotlin-build.md) for the Kotlin smoke workflow.
See [docs/runtime-testsuite.md](docs/runtime-testsuite.md) for the upstream
runtime-testsuite harness.

## Runtime Testsuite

On the maintainer checkout, where the ANTLR jar and upstream runtime-testsuite
live under `/tmp/antlr-cleanroom`, run the full sweep with:

```bash
cargo run --quiet --bin antlr4-runtime-testsuite
```

Run a specific descriptor:

```bash
cargo run --bin antlr4-runtime-testsuite -- \
  --antlr-jar path/to/antlr-4.13.2-complete.jar \
  --descriptors path/to/antlr4/runtime-testsuite \
  --case LexerExec/KeywordID
```

## Useful Information

- ANTLR: <https://www.antlr.org/>
- ANTLR documentation: <https://github.com/antlr/antlr4/blob/dev/doc/index.md>
- Grammars v4: <https://github.com/antlr/grammars-v4>