# minipg - Mini Parser Generator
<p align="center">
<img src="minipg.png" alt="minipg mascot" width="200"/>
</p>
<p align="center">
<a href="https://github.com/yourusername/minipg/actions"><img src="https://github.com/yourusername/minipg/workflows/CI/badge.svg" alt="CI"></a>
<a href="https://crates.io/crates/minipg"><img src="https://img.shields.io/crates/v/minipg.svg" alt="Crates.io"></a>
<a href="https://docs.rs/minipg"><img src="https://docs.rs/minipg/badge.svg" alt="Documentation"></a>
<a href="LICENSE-APACHE"><img src="https://img.shields.io/badge/license-Apache--2.0-blue.svg" alt="License"></a>
<a href="https://www.rust-lang.org"><img src="https://img.shields.io/badge/rust-1.85%2B-orange.svg" alt="Rust Version"></a>
</p>
A fast, Rust-native ANTLR4-compatible parser generator focused on the Rust ecosystem. Generate standalone parsers for Rust and Python from ANTLR4 grammars with no runtime dependencies.
## ✨ Features
### 🚀 Performance
- **Fast code generation** - Sub-millisecond for typical grammars
- **Linear O(n) scaling** with grammar complexity
- **Low memory usage** - <100 KB for code generation
- **No runtime dependencies** - Generates standalone parsers
### 🌍 Language Support (3 Core Languages)
- **Rust** - Primary target with optimized DFA generation and inline attributes ✅
- **Python** - Type hints and dataclasses (Python 3.10+) ✅
- **JavaScript** - Modern ES6+ with error recovery ✅
### 🎯 ANTLR4 Compatible
- **Character Classes** - Full Unicode support with escapes (`\u0000-\uFFFF`) ✅
- **Non-Greedy Quantifiers** - `.*?`, `.+?`, `.??` ✅
- **Lexer Commands** - `-> skip`, `-> channel(NAME)`, `-> mode(NAME)` ✅
- **Lexer Modes & Channels** - Mode stack management and channel routing ✅
- **Labels** - Element labels (`id=ID`) and list labels (`ids+=ID`) ✅
- **Named Actions** - `@header`, `@members` ✅
- **Actions & Predicates** - Embedded actions and semantic predicates ✅
- **Fragments** - Reusable lexer components ✅
- **Parameterized Rules** - Arguments, returns, and local variables ✅
- **Grammar Imports** - `import X;` syntax ✅
- **Grammar Options** - `options {...}` blocks ✅
### 🔧 Core Features
- **Standalone Code Generation** - No runtime dependencies
- **Semantic Analysis** - Undefined rules, duplicates, left recursion detection
- **Rich Diagnostics** - Detailed error messages with location information
- **Visitor Pattern** - Flexible AST traversal
- **CLI Tool** - Easy-to-use command-line interface
- **Comprehensive Testing** - 150+ tests with 100% pass rate
## Architecture
minipg is organized as a single crate with modular structure:
- **core**: Core types, traits, and error handling
- **ast**: Abstract Syntax Tree definitions and visitor patterns
- **parser**: Grammar file parser (lexer + parser)
- **analysis**: Semantic analysis and validation
- **codegen**: Code generation for target languages (Rust, Python, JavaScript)
- **CLI**: Command-line interface with binary
See [ARCHITECTURE.md](ARCHITECTURE.md) for detailed design documentation.
## Installation
### From crates.io
```bash
cargo install minipg
```
### From Source
```bash
git clone https://github.com/yingkitw/minipg
cd minipg
cargo install --path .
```
## Usage
### Generate a Parser
```bash
# Generate Rust parser
minipg generate grammar.g4 -o output/ -l rust
# Generate Python parser
minipg generate grammar.g4 -o output/ -l python
# Generate JavaScript parser
minipg generate grammar.g4 -o output/ -l javascript
```
### Validate a Grammar
```bash
minipg validate grammar.g4
```
### Show Grammar Information
```bash
minipg info grammar.g4
```
## Grammar Syntax
minipg supports ANTLR4-compatible syntax with advanced features:
```antlr4
grammar Calculator;
// Parser rules
NUMBER: [0-9]+;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
// Non-greedy quantifiers for comments
BLOCK_COMMENT: '/*' .*? '*/' -> skip;
LINE_COMMENT: '//' .*? '\n' -> skip;
// Unicode escapes in character classes
WS: [ \t\r\n]+ -> skip;
```
## Comparisons
### minipg vs ANTLR4 vs Pest
| **Language** | Rust | Java | Rust |
| **Runtime Dependency** | None (standalone) | Requires runtime library | Requires runtime library |
| **Grammar Syntax** | ANTLR4 compatible | ANTLR4 (native) | PEG |
| **Target Languages** | Rust, Python, JavaScript | Java, Python, JS, C#, C++, Go, Swift | Rust only |
| **Code Generation** | Standalone parsers | Runtime-based parsers | Macro-based |
| **Generation Speed** | Sub-millisecond | Seconds | Compile-time |
| **Memory Usage** | <100 KB | Higher (JVM overhead) | Low (Rust native) |
| **AST Patterns** | Visitor/listener | Visitor/listener | Manual tree walking |
| **Error Recovery** | Built-in | Built-in | Stops at first error |
| **Test Coverage** | 150+ tests, 100% pass | Comprehensive | Good |
| **Standalone Output** | ✅ Yes | ❌ Requires runtime | ❌ Requires runtime |
| **ANTLR4 Grammars** | ✅ Compatible | ✅ Native | ❌ Different syntax |
**Key Advantages of minipg**:
- ⚡ **Fast code generation** - Sub-millisecond for typical grammars
- 🚀 **No runtime dependencies** - Generates standalone parsers
- 🦀 **Modern Rust** - Safety guarantees and modern tooling
- 📦 **Smaller footprint** - <100 KB memory usage
- 🔧 **Easy integration** - No Java runtime required
- ✅ **ANTLR4 compatible** - Works with existing ANTLR4 grammars
- ✅ **Focused scope** - Rust and Python targets, well-tested
**Choose minipg if you need**:
- ANTLR4 grammar compatibility
- Standalone parsers with no runtime dependencies
- Rust or Python parser generation
- Fast code generation
- Modern Rust implementation
**Choose ANTLR4 if you need**:
- Mature, battle-tested tooling
- Extensive documentation and community
- Java ecosystem integration
- Runtime-based parsing with advanced features
**Choose Pest if you need**:
- Rust-only parsing
- PEG parsing semantics
- Compile-time grammar validation
- Tight Rust macro integration
- Zero-cost abstractions at compile time
See [docs/archive/COMPARISON_WITH_ANTLR4RUST.md](docs/archive/COMPARISON_WITH_ANTLR4RUST.md) and [docs/archive/COMPARISON_WITH_PEST.md](docs/archive/COMPARISON_WITH_PEST.md) for detailed comparisons.
## Documentation
- **[User Guide](docs/USER_GUIDE.md)** - Complete guide to using minipg
- **[Grammar Syntax Reference](docs/GRAMMAR_SYNTAX.md)** - Detailed syntax specification
- **[API Documentation](docs/API.md)** - API reference for library usage
- **[Architecture](ARCHITECTURE.md)** - Design and architecture overview
- **[ANTLR4 Compatibility](docs/ANTLR4_COMPATIBILITY.md)** - Full ANTLR4 grammar support
- **[Multi-Language Plan](docs/MULTI_LANGUAGE_PLAN.md)** - Target language support roadmap
- **[Runtime Decision](docs/RUNTIME_DECISION.md)** - Why standalone generation
- **[Comparison with ANTLR4](docs/archive/COMPARISON_WITH_ANTLR4RUST.md)** - Performance and feature comparison
- **[Comparison with Pest](docs/archive/COMPARISON_WITH_PEST.md)** - Rust parser generator comparison
- **[Examples](examples/)** - 16 example grammars (beginner to advanced)
- Simple: calculator, JSON
- Intermediate: Expression, Config, YAML
- Advanced: GraphQL, Query, CSS, Markdown, Protocol, SQL, Java, Python
- **[Examples Guide](docs/EXAMPLES_GUIDE.md)** - Comprehensive examples documentation
- **[Archive](docs/archive/)** - Historical session reports and release notes
## Development
### Building
```bash
cargo build
```
### Running Tests
```bash
cargo test --all
```
**✅ All Tests Passing!**
minipg has comprehensive test coverage with **186+ tests** passing at 100% success rate:
- **106 unit tests** - Core functionality and parsing
- **19 integration tests** - Full pipeline (parse → analyze → generate)
- **21 analysis tests** - Semantic analysis, ambiguity detection, reachability
- **21 codegen tests** - Multi-language code generation
- **19 compatibility tests** - ANTLR4 feature compatibility
- **13 feature tests** - Advanced grammar features
- **9 example tests** - Real-world grammar examples
**Grammar Test Suite**: minipg can successfully parse and generate code from a wide variety of ANTLR4 grammars, including:
- ✅ All example grammars in the repository
- ✅ Real-world grammars from the grammars-v4 repository
- ✅ Complex grammars with advanced features (modes, channels, actions)
- ✅ Multi-language code generation validation
All tests pass successfully, demonstrating robust grammar parsing and code generation capabilities.
### Running with Logging
```bash
RUST_LOG=info cargo run -- generate grammar.g4
```
## Project Status
- **Current Version**: 0.2.0 (Simplified & Focused)
- **Status**: Core Features Complete ✅
- **Test Suite**: **150+ tests** with **100% pass rate**
- ✅ Grammar parsing tests
- ✅ Code generation tests (Rust, Python, JavaScript)
- ✅ Integration tests
- ✅ ANTLR4 compatibility tests
- **Target Languages**: **3 core languages** (Rust, Python, JavaScript)
- **Grammar Support**:
- ✅ CompleteJSON.g4
- ✅ SQL.g4
- ✅ 15+ example grammars
- ✅ ANTLR4 grammars-v4 compatible
- **ANTLR4 Compatibility**: High - supports most common features
- **Core Features**:
- ✅ Rule arguments: `rule[Type name]`
- ✅ Return values: `returns [Type name]`
- ✅ Local variables: `locals [Type name]`
- ✅ List labels (`ids+=ID`)
- ✅ Named actions (`@header`, `@members`)
- ✅ Lexer modes and channels
- ✅ Grammar composition and imports
See [TODO.md](TODO.md) for current tasks and [docs/archive/ROADMAP.md](docs/archive/ROADMAP.md) for the complete roadmap.
## License
Apache-2.0