php-parser 0.1.3

A fast PHP parser written in Rust
Documentation
# php-parser

A production-grade, fault-tolerant, zero-copy PHP parser written in Rust.

## Features

- **Zero-Copy AST**: Uses `bumpalo` arena allocation for high performance and zero heap allocations for AST nodes.
- **Fault-Tolerant**: Designed to never panic. It produces error nodes and recovers from syntax errors, making it suitable for IDEs and language servers.
- **PHP 8.x Support**: Targets compliance with modern PHP grammar.
- **Safe**: Handles mixed encodings and invalid UTF-8 gracefully by operating on byte slices (`&[u8]`).

## Install as library

```bash
cargo add php-parser
cargo add bumpalo
```

If you want to use the latest version from the GitHub repository, add this to your Cargo.toml:

```toml
[dependencies]
php-parser = { git = "https://github.com/wudi/php-parser" }
bumpalo = "3.19.0"
```

## Usage

Here is a basic example of how to parse a PHP script:

```rust
use bumpalo::Bump;
use php_parser::lexer::Lexer;
use php_parser::parser::Parser;

fn main() {
    // The source code to parse (as bytes)
    let source = b"<?php echo 'Hello, World!';";
    
    // Create an arena for AST allocation
    let arena = Bump::new();

    // Initialize the lexer and parser
    let lexer = Lexer::new(source);
    let mut parser = Parser::new(lexer, &arena);

    let program = parser.parse_program();
    println!("{:#?}", program);

    // php_parser::span::with_session_globals(source, || {
    //     println!("{:#?}", program);
    // });
}
```

## S-Expression Output

You can output the AST in S-expression format for easier visualization:

```rust
use bumpalo::Bump;
use php_parser::lexer::Lexer;
use php_parser::parser::Parser;
use php_parser::ast::sexpr::SExprFormatter;
fn main() {
    let code = "<?php class Foo extends Bar implements Baz { public int $p = 1; function m($a) { return $a; } }";
    let arena = Bump::new();
    let lexer = Lexer::new(code.as_bytes());
    let mut parser = Parser::new(lexer, &arena);
    let program = parser.parse_program();

    let mut formatter = SExprFormatter::new(code.as_bytes());
    formatter.visit_program(&program);
    let output = formatter.finish();
    println!("{}", output);
}
``` 
Gives the output:

```
(program
  (class "Foo" (extends Bar) (implements Baz)
    (members
      (property public int  = (integer 1))
      (method "m" (params ())
        (body
          (return (variable "")))))))

```

## Performance

Test file `run-tests.php` from [php-src](https://github.com/php/php-src/blob/801e587faa0efd2fba633413681c68c83d6f2188/run-tests.php) with 140KB size, here are the benchmark results:

```bash
➜  php-parser git:(master) ✗ cargo run --release --bin bench_file -- run-tests.php
    Finished `release` profile [optimized] target(s) in 0.05s
     Running `target/release/bench_file run-tests.php`
Benchmarking: run-tests.php
File size: 139.63 KB
Warming up...
Running 200 iterations...
Profile written to profile.pb
Flamegraph written to flamegraph.svg
Total time: 134.267333ms
Average time: 671.336µs
Throughput: 203.11 MB/s
```

Table comparing with [nikic/PHP-Parser](https://github.com/nikic/PHP-Parser) v5.6.2

| Parser            | Language       | Time (ms)        |
|-------------------|----------------|------------------|
| nikic/PHP-Parser  | PHP            | 33               |
| tree-sitter-php   | C              | 15               |
| php-parser        | Rust           | 0.67             |

 > Machine specs: Apple M1 Pro, 32GB RAM

## Development

### Running Tests

Run the full test suite:

```bash
cargo test
```

### Snapshot Tests

This project uses `insta` for snapshot testing. If you make changes to the parser that affect the AST output, you may need to review and accept the new snapshots:

```bash
cargo test
cargo insta review
```

### Corpus Testing

To verify stability against real-world codebases (like WordPress or Laravel), use the corpus test runner:

```bash
cargo run --release --bin corpus_test -- /path/to/php/project
```

## Architecture

- **Lexer**: Operates on `&[u8]` and handles PHP's complex lexical modes (Scripting, DoubleQuote, Heredoc).
- **Parser**: A combination of Recursive Descent and Pratt parsing for expressions.
- **AST**: All nodes are allocated in a `Bump` arena. Strings are stored as references to the original source (`&'src [u8]`) or arena-allocated slices.


## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.