tree-sitter-perl-c
Conventional tree-sitter Perl grammar binding (C FFI), maintained for compatibility and comparison against the native v3 parser.
Overview
This crate compiles a vendored snapshot of the upstream tree-sitter-perl C
grammar (parser.c + scanner.c) via the cc crate and exposes a
tree_sitter::Language so Perl source code can be parsed with the official
tree-sitter runtime.
The crate is self-contained: the C sources live under c-src/ and are shipped
in the published package. There is no bindgen or libclang dependency — the
single symbol we need (tree_sitter_perl) is declared by hand in src/lib.rs.
This crate vs. tree-sitter-perl-rs
tree-sitter-perl-c (this crate) |
tree-sitter-perl-rs |
|
|---|---|---|
| Backend | Upstream C grammar (FFI) | Facade over native v3 Rust parser |
| Best for | Compatibility testing, non-Rust tooling, baseline benchmarking | New Rust projects, embedded use, no C toolchain |
| Build dep | C compiler required | Pure Rust |
| Grammar source | Upstream tree-sitter-perl | Native v3 recursive-descent |
Choose tree-sitter-perl-c when you need:
- Compatibility testing — compare parse output against the C reference grammar
- Non-Rust tree-sitter tooling — the C grammar snapshot can be consumed by language bindings in Python, Node.js, etc.
- Baseline benchmarking — measure parse throughput of the C grammar vs. the native v3 parser
Choose tree-sitter-perl-rs for new Rust projects.
Quick Start
Add to Cargo.toml:
[]
= "0.12"
Parse Perl source:
use parse_perl_code;
For repeated parsing, reuse a configured parser with the helper APIs:
use ;
let mut parser = try_create_parser.unwrap;
for snippet in &
Typed parse helpers expose a small error surface:
use ;
match try_parse_perl_file
Public API
| Function | Description |
|---|---|
language() |
Returns the tree-sitter Language for Perl |
try_create_parser() |
Creates a tree_sitter::Parser (returns Result) |
create_parser() |
Creates a parser, silently ignoring language-set errors |
parse_perl_bytes(code) |
Parses raw bytes (including non-UTF-8 Perl source) |
parse_perl_bytes_with_parser(parser, code) |
Parses raw bytes with a caller-provided configured parser |
parse_perl_code(code) |
Parses a &str into a tree_sitter::Tree |
parse_perl_code_with_parser(parser, code) |
Parses a &str with a caller-provided configured parser |
parse_perl_file(path) |
Reads and parses a file (non-UTF-8 safe) |
try_parse_perl_bytes(code) |
Typed parse API (ParsePerlError) for byte slices |
try_parse_perl_code(code) |
Typed parse API (ParsePerlError) for &str |
try_parse_perl_file(path) |
Typed parse API (ParsePerlError) for file paths |
ParsePerlError |
Distinguishes setup, parse-none, and IO failures |
get_scanner_config() |
Returns "c-scanner" |
Binaries
parse_c— parse a Perl file using the byte-oriented API (non-UTF-8 safe), then:- exits
0when the parse tree has no error nodes - exits
1when reading/parsing fails or the tree contains syntax errors - supports triage flags:
--root-kindto print the root node kind--has-errorto printtrue/falsefor parse errors--sexpto print the full tree-sitter s-expression
- exits
bench_parser_c— benchmark parse throughput and emit stablekey=valueoutput (requires--features test-utils)
bench_parser_c modes
bench_parser_c supports both one-shot and parser-reuse flows:
--mode cold(default): create a fresh parser for every iteration--mode warm: reuse one parser across all iterations--iterations N/-n N: run N parse iterations--input str|bytes(defaultstr): parse through UTF-8 string or raw-byte path--cold/--warm: shorthand for--mode cold|warm
Example:
Output is intentionally stable for run-to-run diffing:
mode=warm
input=bytes
iterations=200
total_us=12345
avg_us=61
has_error=false
Examples:
# Basic parse check (succeeds only when there are no parse errors)
# Triage output for debugging parser behavior
Snapshot provenance and refresh
Snapshot provenance and the refresh workflow are tracked in
UPSTREAM_SNAPSHOT.md.
That document records:
- upstream repository/reference
- generator version used for
parser.c - file fingerprints for auditability
- the exact local refresh + validation checklist
Vendored files vs local wrapper code
Vendored from upstream snapshot (c-src/):
parser.c,scanner.cbsearch.h,tsp_unicode.htree_sitter/{parser.h,array.h,alloc.h}
Maintained locally in this crate:
src/lib.rs(Rust FFI wrapper + helpers)build.rs(C compilation/link wiring)tests/andsrc/bin/(integration and sanity tooling)- crate docs (
README.md,ROADMAP.md,UPSTREAM_SNAPSHOT.md)
Build Requirements
Only a C compiler is required. No libclang or other FFI-generator toolchain
is needed.
# Debian/Ubuntu
# macOS
# Windows: MSVC (via Visual Studio) or MinGW-w64 both work
Links
- tree-sitter-perl upstream grammar
tree-sitter-perl-rs— sibling crate, facade over the native v3 Rust parser- perl-parser — the native v3 recursive-descent Perl parser
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.