Expand description
§perl-parser — Production-grade Perl parser and Language Server Protocol engine
A comprehensive Perl parser built on recursive descent principles, providing robust AST generation, LSP feature providers, workspace indexing, and test-driven development support.
§Key Features
- Tree-sitter Compatible: AST with kinds, fields, and position tracking compatible with tree-sitter grammar
- Comprehensive Parsing: ~100% edge case coverage for Perl 5.8-5.40 syntax
- LSP Integration: Full Language Server Protocol feature set (~82% coverage in v0.8.6)
- TDD Workflow: Intelligent test generation with return value analysis
- Incremental Parsing: Efficient re-parsing for real-time editing
- Error Recovery: Graceful handling of malformed input with detailed diagnostics
- Workspace Navigation: Cross-file symbol resolution and reference tracking
§Quick Start
§Basic Parsing
use perl_parser::Parser;
let code = r#"sub hello { print "Hello, world!\n"; }"#;
let mut parser = Parser::new(code);
match parser.parse() {
Ok(ast) => {
println!("AST: {}", ast.to_sexp());
println!("Parsed {} nodes", ast.count_nodes());
}
Err(e) => eprintln!("Parse error: {}", e),
}§Test-Driven Development
Generate tests automatically from parsed code:
use perl_parser::Parser;
use perl_parser::tdd::test_generator::{TestGenerator, TestFramework};
let code = r#"sub add { my ($a, $b) = @_; return $a + $b; }"#;
let mut parser = Parser::new(code);
let ast = parser.parse()?;
let generator = TestGenerator::new(TestFramework::TestMore);
let tests = generator.generate_tests(&ast, code);
// Returns test cases with intelligent assertions
assert!(!tests.is_empty());§LSP Integration
Use as a library for LSP features (see perl-lsp for the standalone server):
use perl_parser::Parser;
use perl_parser::analysis::semantic::SemanticAnalyzer;
let code = "my $x = 42;";
let mut parser = Parser::new(code);
let ast = parser.parse()?;
// Semantic analysis for hover, completion, etc.
let model = SemanticAnalyzer::analyze(&ast);§Architecture
The parser is organized into distinct layers for maintainability and testability:
§Core Engine (engine)
parser: Recursive descent parser with operator precedenceast: Abstract Syntax Tree definitions and node typeserror: Error classification, recovery strategies, and diagnosticsposition: UTF-16 position mapping for LSP protocol compliancequote_parser: Specialized parser for quote-like operatorsheredoc_collector: FIFO heredoc collection with indent stripping
§IDE Integration (LSP Provider Modules)
completion: Context-aware completion providersdiagnostics: Diagnostics generation and formattingreferences: Reference search providersrename: Rename providers with validationsemantic_tokens: Semantic token generationtype_definition: Type definition providersworkspace_symbols: Workspace symbol search
§Analysis (analysis)
scope_analyzer: Variable and subroutine scoping resolutiontype_inference: Perl type inference enginesemantic: Semantic model with hover informationsymbol: Symbol table and reference trackingdead_code_detector: Unused code detection
§Workspace (workspace)
workspace_index: Cross-file symbol indexingworkspace_rename: Multi-file refactoringdocument_store: Document state management
§Refactoring (refactor)
refactoring: Unified refactoring enginemodernize: Code modernization utilitiesimport_optimizer: Import statement analysis and optimization
§Test Support (tdd)
test_generator: Intelligent test case generationtest_runner: Test execution and validationtdd_workflow(test-only): TDD cycle management and coverage tracking
§LSP Feature Support
This crate provides the engine for LSP features. The public standalone server is in
perllsp, backed by the perl-lsp-rs implementation crate.
§Implemented Features
- Completion: Context-aware code completion with type inference
- Hover: Documentation and type information on hover
- Definition: Go-to-definition with cross-file support
- References: Find all references with workspace indexing
- Rename: Symbol renaming with conflict detection
- Diagnostics: Syntax errors and semantic warnings
- Formatting: Code formatting via perltidy integration
- Folding: Code folding for blocks and regions
- Semantic Tokens: Fine-grained syntax highlighting
- Call Hierarchy: Function call navigation
- Type Hierarchy: Class inheritance navigation
See docs/reference/LSP_CAPABILITY_POLICY.md for the complete capability matrix.
§Incremental Parsing
Enable efficient re-parsing for real-time editing:
use perl_parser::{IncrementalState, apply_edits, Edit};
let mut state = IncrementalState::new("my $x = 1;");
let ast = state.parse()?;
// Apply an edit
let edit = Edit {
start_byte: 3,
old_end_byte: 5,
new_end_byte: 5,
text: "$y".to_string(),
};
apply_edits(&mut state, vec![edit]);
// Incremental re-parse reuses unchanged nodes
let new_ast = state.parse()?;§Error Recovery
The parser uses intelligent error recovery to continue parsing after errors:
use perl_parser::Parser;
let code = "sub broken { if ("; // Incomplete code
let mut parser = Parser::new(code);
// Parser recovers and builds partial AST
let result = parser.parse();
assert!(result.is_ok());
// Check recorded errors
let errors = parser.errors();
assert!(!errors.is_empty());§Workspace Indexing
Build cross-file indexes for workspace-wide navigation:
use perl_parser::workspace_index::WorkspaceIndex;
let mut index = WorkspaceIndex::new();
index.index_file("lib/Foo.pm", "package Foo; sub bar { }");
index.index_file("lib/Baz.pm", "use Foo; Foo::bar();");
// Find all references to Foo::bar
let refs = index.find_references("Foo::bar");§Testing with perl-corpus
The parser is tested against the comprehensive perl-corpus test suite:
# Run parser tests with full corpus coverage
cargo test -p perl-parser
# Run specific test category
cargo test -p perl-parser --test regex_tests
# Validate documentation examples
cargo test --doc§Command-Line Tools
Build and install the LSP server binary:
# Build LSP server
cargo build -p perllsp --release
# Install globally
cargo install --path crates/perllsp
# Run LSP server
perllsp --stdio
# Check server health
perllsp --health§Integration Examples
§VSCode Extension
Configure the LSP server in VSCode settings:
{
"perl.lsp.path": "/path/to/perllsp",
"perl.lsp.args": ["--stdio"]
}§Neovim Integration
require'lspconfig'.perl.setup{
cmd = { "/path/to/perllsp", "--stdio" },
}§Performance Characteristics
- Single-pass parsing: O(n) complexity for well-formed input
- UTF-16 mapping: Fast bidirectional offset conversion for LSP
- Incremental updates: Reuses unchanged AST nodes for efficiency
- Memory efficiency: Streaming token processing with bounded lookahead
§Compatibility
- Perl Versions: 5.8 through 5.40 (covers 99% of CPAN)
- LSP Protocol: LSP 3.17 specification
- Tree-sitter: Compatible AST format and position tracking
- UTF-16: Full Unicode support with correct LSP position mapping
§Related Crates
perllsp: Public Cargo entry point for the standalone LSP serverperl-lsp-rs: Standalone LSP server runtime implementation (moved from this crate)perl-lexer: Context-aware Perl tokenizerperl-corpus: Comprehensive test corpus and generatorsperl-dap: Debug Adapter Protocol implementation
§Documentation
- API Docs: See module documentation below
- LSP Guide:
docs/reference/LSP_IMPLEMENTATION_GUIDE.md - Capability Policy:
docs/reference/LSP_CAPABILITY_POLICY.md - Commands:
docs/reference/COMMANDS_REFERENCE.md - Current Status:
docs/project/CURRENT_STATUS.md
§Architecture
The parser follows a recursive descent design with operator precedence handling, maintaining a clean separation from the lexing phase. This modular approach enables:
- Independent testing of parsing logic
- Easy integration with different lexer implementations
- Clear error boundaries between lexing and parsing phases
- Optimal performance through single-pass parsing
§Example
use perl_parser::Parser;
let code = "my $x = 42;";
let mut parser = Parser::new(code);
match parser.parse() {
Ok(ast) => println!("AST: {}", ast.to_sexp()),
Err(e) => eprintln!("Parse error: {}", e),
}Re-exports§
pub use tooling::performance;pub use tooling::perl_critic;pub use tooling::perltidy;pub use engine::ast_v2;pub use engine::edit;pub use engine::heredoc_collector;pub use engine::pragma_tracker;pub use engine::quote_parser;pub use builtins::builtin_signatures_phf;pub use perl_dead_code as dead_code_detector;
Modules§
- analysis
- Semantic analysis, scope resolution, and type inference. Compatibility re-export of semantic analysis modules.
- ast
- Abstract Syntax Tree (AST) definitions for Perl parsing. Parser engine components and supporting utilities. Abstract Syntax Tree (AST) definitions for Perl parsing. AST facade for the core parser engine.
- builtin_
signatures - Builtin function signature lookup tables. Builtin function signatures and metadata. Comprehensive built-in function signatures for Perl scripting.
- builtins
- Perl builtin function signatures and metadata.
Re-exported builtin signature tables from
perl-parser-core. - code_
actions - LSP code actions for automated refactoring and fixes.
- completion
- LSP completion for code suggestions.
- declaration
- Variable and subroutine declaration analysis. Semantic analysis, symbol extraction, and type inference. Go-to-declaration support and parent map construction. Declaration Provider for LSP
- diagnostics
- LSP diagnostics for error reporting.
- document_
links - LSP document links provider for file and URL navigation.
- document_
store - In-memory document storage for open editor buffers. Workspace indexing and refactoring orchestration. Document store for managing in-memory text content
- engine
- Parser engine components and supporting utilities.
Re-exported parser engine modules from
perl-parser-core. - error
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Error types and recovery strategies for parser failures. Error types and recovery helpers for the parser engine.
- error_
classifier - Error classification and recovery strategies for parse failures. Error classification and diagnostic generation for parsed Perl code. Error classification and diagnostic generation for Perl parsing workflows
- error_
recovery - Error recovery strategies for resilient parsing. Error recovery strategies and traits for the Perl parser. Error recovery for the Perl parser
- implementation_
provider - LSP implementation provider.
- import_
optimizer - Import statement analysis and optimization. Refactoring and modernization helpers. Import optimization for Perl modules
- index
- File and symbol indexing for workspace-wide navigation. Semantic analysis, symbol extraction, and type inference. Lightweight workspace symbol index. Cross-file workspace indexing for Perl symbols
- inlay_
hints - LSP inlay hints for inline type and parameter information.
- inlay_
hints_ provider - LSP inlay hints provider implementation.
- line_
index - Line-to-byte offset index for fast position lookups. Line indexing and position mapping utilities.
- modernize
- Code modernization utilities for Perl best practices. Refactoring and modernization helpers. Legacy Perl modernization helpers.
- modernize_
refactored - Enhanced code modernization with refactoring capabilities. Refactoring and modernization helpers. Refactored modernization engine with structured pattern definitions.
- parser
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Core parser implementation for Perl source. Recursive descent Perl parser.
- parser_
context - Parser context with error recovery support. Parser engine components and supporting utilities. Parser context with error recovery support. Parser context with error recovery support
- position
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Position tracking types and UTF-16 mapping utilities. Enhanced position tracking for incremental parsing
- refactor
- Code refactoring, modernization, and import optimization. Compatibility re-export of refactoring modules.
- refactoring
- Unified refactoring engine for comprehensive code transformations. Refactoring and modernization helpers. Unified refactoring engine for Perl code transformations
- references
- LSP references provider for symbol usage analysis.
- rename
- LSP rename for symbol renaming.
- scope_
analyzer - Scope analysis for variable and subroutine resolution. Semantic analysis, symbol extraction, and type inference. Scope analysis for variable and subroutine resolution. Scope analysis and variable tracking for Perl parsing workflows
- semantic
- Semantic model with hover information and token classification. Semantic analysis, symbol extraction, and type inference. Semantic analyzer and token classification. Semantic analysis for IDE features.
- semantic_
tokens - LSP semantic tokens provider for syntax highlighting.
- semantic_
tokens_ provider - LSP semantic tokens provider implementation.
- symbol
- Symbol table, extraction, and reference tracking. Semantic analysis, symbol extraction, and type inference. Symbol extraction and symbol table construction. Symbol extraction and symbol table for IDE features
- tdd
- Test-driven development support and test generation. Compatibility re-export of TDD support modules.
- tdd_
basic - Basic TDD utilities and test helpers. Test-driven development helpers and generators. Basic TDD workflow support for LSP
- test_
generator - Intelligent test case generation from parsed Perl code. Test-driven development helpers and generators. Test generator for TDD workflow support
- test_
runner - Test execution and TDD support functionality. Test-driven development helpers and generators. Test execution and TDD support functionality.
- token_
stream - Token stream with position-aware iteration. Token stream and trivia utilities for the parser. Token stream adapters used during the Parse stage for LSP workflows. Token stream facade for the core parser engine.
- token_
wrapper - Lightweight token wrapper for AST integration. Token stream and trivia utilities for the parser. Token wrapper with enhanced position tracking
- tokens
- Token stream, trivia, and token wrapper utilities.
Re-exported token stream utilities from
perl-parser-core. - tooling
- External tooling integration (perltidy, perlcritic, performance). Compatibility re-export of tooling integrations.
- trivia
- Trivia (whitespace and comments) representation. Token stream and trivia utilities for the parser. Trivia (comments and whitespace) handling for the Perl parser
- trivia_
parser - Parser that preserves trivia tokens for formatting. Token stream and trivia utilities for the parser. Trivia-preserving parser implementation
- type_
definition - LSP type definition provider.
- type_
hierarchy - LSP type hierarchy provider for inheritance navigation.
- type_
inference - Type inference engine for Perl variable analysis. Semantic analysis, symbol extraction, and type inference. Type inference engine for Perl variable analysis.
- util
- Parser utilities and helpers. Utility functions for the Perl parser
- workspace
- Workspace indexing, document store, and cross-file operations. Compatibility re-export of workspace indexing modules.
- workspace_
index - Cross-file symbol index for workspace-wide navigation. Workspace indexing and refactoring orchestration. Workspace-wide symbol index for fast cross-file lookups in Perl LSP.
- workspace_
refactor - Multi-file refactoring operations across a workspace. Workspace-wide refactoring operations for Perl codebases
- workspace_
rename - Cross-file symbol renaming with conflict detection. Workspace indexing and refactoring orchestration. LSP feature module (deprecated)
- workspace_
symbols - LSP workspace symbols provider.
Structs§
- Duplicate
Import - Import analysis, optimization, and unused import detection. A module that is imported multiple times
- Enhanced
Code Actions Provider - Enhanced code actions provider with workspace-aware refactoring. Enhanced code actions provider with additional refactorings
- Hover
Info - Semantic analysis types for hover, tokens, and code understanding. Hover information for symbols displayed in LSP hover requests.
- Import
Analysis - Import analysis, optimization, and unused import detection. Result of import analysis containing all detected issues and suggestions
- Import
Entry - Import analysis, optimization, and unused import detection. A single import statement discovered during analysis
- Import
Optimizer - Import analysis, optimization, and unused import detection. Import optimizer for analyzing and optimizing Perl import statements
- Missing
Import - Import analysis, optimization, and unused import detection. A symbol that is used but not imported
- Node
- AST node, node kind enum, and source location types. Core AST node representing any Perl language construct within parsing workflows.
- Node
With Trivia - Trivia (whitespace/comments) attached to AST nodes. A node with attached trivia
- Organization
Suggestion - Import analysis, optimization, and unused import detection. A suggestion for improving import organization
- Parser
- Recursive descent Perl parser with error recovery and AST generation. Parser state for a single Perl source input.
- Position
Mapper - Line ending detection and UTF-16 position mapping for LSP compliance. Centralized position mapper using rope for efficiency.
- Pragma
State - Pragma state tracking for
use strict,use warnings, etc. Pragma state at a given point in the code - Pragma
Tracker - Pragma state tracking for
use strict,use warnings, etc. Tracks pragma state throughout a Perl file - Refactoring
Config - Refactoring engine types: configuration, operations, and results. Configuration for refactoring operations
- Refactoring
Engine - Refactoring engine types: configuration, operations, and results. Unified refactoring engine that coordinates all refactoring operations
- Refactoring
Operation - Refactoring engine types: configuration, operations, and results. Record of a refactoring operation for rollback support
- Refactoring
Result - Refactoring engine types: configuration, operations, and results. Result of a refactoring operation
- Scope
Analyzer - Scope analysis issue types and analyzer.
- Scope
Issue - Scope analysis issue types and analyzer.
- Semantic
Analyzer - Semantic analysis types for hover, tokens, and code understanding. Semantic analyzer providing comprehensive IDE features for Perl code.
- Semantic
Model - Semantic analysis types for hover, tokens, and code understanding. A stable, query-oriented view of semantic information over a parsed file.
- Semantic
Token - Semantic analysis types for hover, tokens, and code understanding. A semantic token with type and modifiers for LSP syntax highlighting.
- Symbol
- Symbol extraction, table, and reference types for navigation. A symbol definition in Perl code with comprehensive metadata for Index/Navigate workflows.
- Symbol
Extractor - Symbol extraction, table, and reference types for navigation. Extract symbols from an AST for Parse/Index workflows.
- Symbol
Reference - Symbol extraction, table, and reference types for navigation. A reference to a symbol with usage context for Navigate/Analyze workflows.
- Symbol
Table - Symbol extraction, table, and reference types for navigation. Comprehensive symbol table for Perl code analysis and LSP features in Index/Analyze stages.
- Token
- Token types and token stream for lexer output. Token produced by the lexer and consumed by the parser.
- Token
Stream - Token types and token stream for lexer output. Token stream that wraps perl-lexer
- Trivia
Preserving Parser - Trivia-preserving parser and formatting utilities. Parser that preserves trivia
- Trivia
Token - Trivia (whitespace/comments) attached to AST nodes. A trivia token with position information
- Type
Based Completion - Type inference types: Perl types, constraints, and inference engine. Type-based code completion suggestions
- Type
Constraint - Type inference types: Perl types, constraints, and inference engine. Type constraint for type checking
- Type
Environment - Type inference types: Perl types, constraints, and inference engine. Type environment for tracking variable types
- Type
Inference Engine - Type inference types: Perl types, constraints, and inference engine. Main type inference engine
- Type
Location - Type inference types: Perl types, constraints, and inference engine. Location information for type errors
- Unused
Import - Import analysis, optimization, and unused import detection. An import statement containing unused symbols
Enums§
- Issue
Kind - Scope analysis issue types and analyzer.
- Line
Ending - Line ending detection and UTF-16 position mapping for LSP compliance. Line ending style detected in a document
- Modernization
Pattern - Refactoring engine types: configuration, operations, and results. Modernization patterns for legacy code
- Node
Kind - AST node, node kind enum, and source location types. Comprehensive enumeration of all Perl language constructs supported by the parser.
- Parse
Error - Parse error and result types for parser output. Comprehensive error types that can occur during Perl parsing workflows
- Perl
Type - Type inference types: Perl types, constraints, and inference engine. Represents a Perl type
- Refactoring
Scope - Refactoring engine types: configuration, operations, and results. Scope of refactoring operations
- Refactoring
Type - Refactoring engine types: configuration, operations, and results. Types of refactoring operations supported by the engine
- Scalar
Type - Type inference types: Perl types, constraints, and inference engine. Represents specific scalar types in Perl
- Semantic
Token Modifier - Semantic analysis types for hover, tokens, and code understanding. Semantic token modifiers for Analyze/Complete stage highlighting.
- Semantic
Token Type - Semantic analysis types for hover, tokens, and code understanding. Semantic token types for syntax highlighting in the Parse/Complete workflow.
- Suggestion
Priority - Import analysis, optimization, and unused import detection. Priority level for organization suggestions
- Symbol
Kind - Symbol extraction, table, and reference types for navigation. Unified Perl symbol classification for LSP tooling.
- Token
Kind - Token types and token stream for lexer output. Token classification for Perl parsing.
- Trivia
- Trivia (whitespace/comments) attached to AST nodes. Trivia represents non-semantic tokens like comments and whitespace
Functions§
- format_
with_ trivia - Trivia-preserving parser and formatting utilities. Format an AST with trivia back to source code
Type Aliases§
- Parse
Result - Parse error and result types for parser output. Result type for parser operations in the Perl parsing workflow pipeline
- Source
Location - AST node, node kind enum, and source location types.
Type alias for backward compatibility with
SourceLocation.