# php-parser
A production-grade, fault-tolerant, zero-copy PHP parser written in Rust.
## Features
- **Zero-Copy AST**: Uses `bumpalo` arena allocation for high performance and zero heap allocations for AST nodes.
- **Fault-Tolerant**: Designed to never panic. It produces error nodes and recovers from syntax errors, making it suitable for IDEs and language servers.
- **PHP 8.x Support**: Targets compliance with modern PHP grammar.
- **Safe**: Handles mixed encodings and invalid UTF-8 gracefully by operating on byte slices (`&[u8]`).
## Install as library
```bash
cargo add php-parser
cargo add bumpalo
```
If you want to use the latest version from the GitHub repository, add this to your Cargo.toml:
```toml
[dependencies]
php-parser = { git = "https://github.com/wudi/php-parser" }
bumpalo = "3.19.0"
```
## Usage
Here is a basic example of how to parse a PHP script:
```rust
use bumpalo::Bump;
use php_parser::lexer::Lexer;
use php_parser::parser::Parser;
fn main() {
// The source code to parse (as bytes)
let source = b"<?php echo 'Hello, World!';";
// Create an arena for AST allocation
let arena = Bump::new();
// Initialize the lexer and parser
let lexer = Lexer::new(source);
let mut parser = Parser::new(lexer, &arena);
let program = parser.parse_program();
println!("{:#?}", program);
// php_parser::span::with_session_globals(source, || {
// println!("{:#?}", program);
// });
}
```
## S-Expression Output
You can output the AST in S-expression format for easier visualization:
```rust
use bumpalo::Bump;
use php_parser::lexer::Lexer;
use php_parser::parser::Parser;
use php_parser::ast::sexpr::SExprFormatter;
fn main() {
let code = "<?php class Foo extends Bar implements Baz { public int $p = 1; function m($a) { return $a; } }";
let arena = Bump::new();
let lexer = Lexer::new(code.as_bytes());
let mut parser = Parser::new(lexer, &arena);
let program = parser.parse_program();
let mut formatter = SExprFormatter::new(code.as_bytes());
formatter.visit_program(&program);
let output = formatter.finish();
println!("{}", output);
}
```
Gives the output:
```
(program
(class "Foo" (extends Bar) (implements Baz)
(members
(property public int = (integer 1))
(method "m" (params ())
(body
(return (variable "")))))))
```
## Performance
Test file `run-tests.php` from [php-src](https://github.com/php/php-src/blob/801e587faa0efd2fba633413681c68c83d6f2188/run-tests.php) with 140KB size, here are the benchmark results:
```bash
➜ php-parser git:(master) ✗ cargo run --release --bin bench_file -- run-tests.php
Finished `release` profile [optimized] target(s) in 0.05s
Running `target/release/bench_file run-tests.php`
Benchmarking: run-tests.php
File size: 139.63 KB
Warming up...
Running 200 iterations...
Profile written to profile.pb
Flamegraph written to flamegraph.svg
Total time: 134.267333ms
Average time: 671.336µs
Throughput: 203.11 MB/s
```
Table comparing with [nikic/PHP-Parser](https://github.com/nikic/PHP-Parser) v5.6.2
| nikic/PHP-Parser | PHP | 33 |
| tree-sitter-php | C | 15 |
| php-parser | Rust | 0.67 |
> Machine specs: Apple M1 Pro, 32GB RAM
## Development
### Running Tests
Run the full test suite:
```bash
cargo test
```
### Snapshot Tests
This project uses `insta` for snapshot testing. If you make changes to the parser that affect the AST output, you may need to review and accept the new snapshots:
```bash
cargo test
cargo insta review
```
### Corpus Testing
To verify stability against real-world codebases (like WordPress or Laravel), use the corpus test runner:
```bash
cargo run --release --bin corpus_test -- /path/to/php/project
```
## Architecture
- **Lexer**: Operates on `&[u8]` and handles PHP's complex lexical modes (Scripting, DoubleQuote, Heredoc).
- **Parser**: A combination of Recursive Descent and Pratt parsing for expressions.
- **AST**: All nodes are allocated in a `Bump` arena. Strings are stored as references to the original source (`&'src [u8]`) or arena-allocated slices.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.