Expand description
§perl-parser — Production-grade Perl parser and Language Server Protocol engine
A comprehensive Perl parser built on recursive descent principles, providing robust AST generation, LSP feature providers, workspace indexing, and test-driven development support.
§Key Features
- Tree-sitter Compatible: AST with kinds, fields, and position tracking compatible with tree-sitter grammar
- Comprehensive Parsing: ~100% edge case coverage for Perl 5.8-5.40 syntax
- LSP Integration: Full Language Server Protocol feature set (100% compliance, LSP 3.18)
- TDD Workflow: Intelligent test generation with return value analysis
- Incremental Parsing: Efficient re-parsing for real-time editing
- Error Recovery: Graceful handling of malformed input with detailed diagnostics
- Workspace Navigation: Cross-file symbol resolution and reference tracking
§Quick Start
§Basic Parsing
use perl_parser::Parser;
let code = r#"sub hello { print "Hello, world!\n"; }"#;
let mut parser = Parser::new(code);
match parser.parse() {
Ok(ast) => {
println!("AST: {}", ast.to_sexp());
println!("Parsed {} nodes", ast.count_nodes());
}
Err(e) => eprintln!("Parse error: {}", e),
}§Test-Driven Development
Generate tests automatically from parsed code:
use perl_parser::Parser;
use perl_parser::tdd::test_generator::{TestGenerator, TestFramework};
let code = r#"sub add { my ($a, $b) = @_; return $a + $b; }"#;
let mut parser = Parser::new(code);
let ast = parser.parse()?;
let generator = TestGenerator::new(TestFramework::TestMore);
let tests = generator.generate_tests(&ast, code);
// Returns test cases with intelligent assertions
assert!(!tests.is_empty());§LSP Integration
Use as a library for LSP features (see perl-lsp for the standalone server):
use perl_parser::Parser;
use perl_parser::analysis::semantic::SemanticAnalyzer;
let code = "my $x = 42;";
let mut parser = Parser::new(code);
let ast = parser.parse()?;
// Semantic analysis for hover, completion, etc.
let model = SemanticAnalyzer::analyze(&ast);§Architecture
The parser is organized into distinct layers for maintainability and testability:
§Core Engine (engine)
parser: Recursive descent parser with operator precedenceast: Abstract Syntax Tree definitions and node typeserror: Error classification, recovery strategies, and diagnosticsposition: UTF-16 position mapping for LSP protocol compliancequote_parser: Specialized parser for quote-like operatorsheredoc_collector: FIFO heredoc collection with indent stripping
§IDE Integration (LSP Provider Crates)
LSP provider modules were removed from perl-parser as part of #4414 (microcrate
collapse, PR #0). Import directly from the provider crates:
perl_lsp_completion— context-aware completion providersperl_lsp_diagnostics— diagnostics generation and formattingperl_lsp_navigation— references, document links, type definitions, workspace symbolsperl_lsp_rename— rename providers with validationperl_lsp_semantic_tokens— semantic token generationperl_lsp_inlay_hints— inlay hint providersperl_lsp_code_actions— code action providers
§Analysis (analysis)
scope_analyzer: Variable and subroutine scoping resolutiontype_inference: Perl type inference enginesemantic: Semantic model with hover informationsymbol: Symbol table and reference trackingdead_code_detector: Unused code detection
§Workspace (workspace)
workspace_index: Cross-file symbol indexingworkspace_rename: Multi-file refactoringdocument_store: Document state management
§Refactoring (refactor)
refactoring: Unified refactoring enginemodernize: Code modernization utilitiesimport_optimizer: Import statement analysis and optimization
§Test Support (tdd)
test_generator: Intelligent test case generationtest_runner: Test execution and validationtdd_workflow(test-only): TDD cycle management and coverage tracking
§LSP Feature Support
This crate provides the engine for LSP features. The public standalone server is in
perllsp, backed by the perl-lsp-rs implementation crate.
§Implemented Features
- Completion: Context-aware code completion with type inference
- Hover: Documentation and type information on hover
- Definition: Go-to-definition with cross-file support
- References: Find all references with workspace indexing
- Rename: Symbol renaming with conflict detection
- Diagnostics: Syntax errors and semantic warnings
- Formatting: Code formatting via perltidy integration
- Folding: Code folding for blocks and regions
- Semantic Tokens: Fine-grained syntax highlighting
- Call Hierarchy: Function call navigation
- Type Hierarchy: Class inheritance navigation
See docs/reference/LSP_CAPABILITY_POLICY.md for the complete capability matrix.
§Incremental Parsing
Enable efficient re-parsing for real-time editing:
use perl_parser::{IncrementalState, apply_edits, Edit};
let mut state = IncrementalState::new("my $x = 1;");
let ast = state.parse()?;
// Apply an edit
let edit = Edit {
start_byte: 3,
old_end_byte: 5,
new_end_byte: 5,
text: "$y".to_string(),
};
apply_edits(&mut state, vec![edit]);
// Incremental re-parse reuses unchanged nodes
let new_ast = state.parse()?;§Error Recovery
The parser uses intelligent error recovery to continue parsing after errors:
use perl_parser::Parser;
let code = "sub broken { if ("; // Incomplete code
let mut parser = Parser::new(code);
// Parser recovers and builds partial AST
let result = parser.parse();
assert!(result.is_ok());
// Check recorded errors
let errors = parser.errors();
assert!(!errors.is_empty());§Workspace Indexing
Build cross-file indexes for workspace-wide navigation:
use perl_parser::workspace_index::WorkspaceIndex;
let mut index = WorkspaceIndex::new();
index.index_file("lib/Foo.pm", "package Foo; sub bar { }");
index.index_file("lib/Baz.pm", "use Foo; Foo::bar();");
// Find all references to Foo::bar
let refs = index.find_references("Foo::bar");§Testing with perl-corpus
The parser is tested against the comprehensive perl-corpus test suite:
# Run parser tests with full corpus coverage
cargo test -p perl-parser
# Run specific test category
cargo test -p perl-parser --test regex_tests
# Validate documentation examples
cargo test --doc§Command-Line Tools
Build and install the LSP server binary:
# Build LSP server
cargo build -p perllsp --release
# Install globally
cargo install --path crates/perllsp
# Run LSP server
perllsp --stdio
# Check server health
perllsp --health§Integration Examples
§VSCode Extension
Configure the LSP server in VSCode settings:
{
"perl.lsp.path": "/path/to/perllsp",
"perl.lsp.args": ["--stdio"]
}§Neovim Integration
require'lspconfig'.perl.setup{
cmd = { "/path/to/perllsp", "--stdio" },
}§Performance Characteristics
- Single-pass parsing: O(n) complexity for well-formed input
- UTF-16 mapping: Fast bidirectional offset conversion for LSP
- Incremental updates: Reuses unchanged AST nodes for efficiency
- Memory efficiency: Streaming token processing with bounded lookahead
§Compatibility
- Perl Versions: 5.8 through 5.40 (covers 99% of CPAN)
- LSP Protocol: LSP 3.18 specification
- Tree-sitter: Compatible AST format and position tracking
- UTF-16: Full Unicode support with correct LSP position mapping
§Related Crates
perllsp: Public Cargo entry point for the standalone LSP serverperl-lsp-rs: Standalone LSP server runtime implementation (moved from this crate)perl-lexer: Context-aware Perl tokenizerperl-corpus: Comprehensive test corpus and generatorsperl-dap: Debug Adapter Protocol implementation
§Documentation
- API Docs: See module documentation below
- LSP Guide:
docs/reference/LSP_IMPLEMENTATION_GUIDE.md - Capability Policy:
docs/reference/LSP_CAPABILITY_POLICY.md - Commands:
docs/reference/COMMANDS_REFERENCE.md - Current Status:
docs/project/CURRENT_STATUS.md
§Architecture
The parser follows a recursive descent design with operator precedence handling, maintaining a clean separation from the lexing phase. This modular approach enables:
- Independent testing of parsing logic
- Easy integration with different lexer implementations
- Clear error boundaries between lexing and parsing phases
- Optimal performance through single-pass parsing
§Example
use perl_parser::Parser;
let code = "my $x = 42;";
let mut parser = Parser::new(code);
match parser.parse() {
Ok(ast) => println!("AST: {}", ast.to_sexp()),
Err(e) => eprintln!("Parse error: {}", e),
}Re-exports§
pub use dead_code as dead_code_detector;pub use refactor::import_optimizer;pub use refactor::modernize;pub use refactor::modernize_refactored;pub use refactor::refactoring;pub use incremental::incremental_advanced_reuse;pub use incremental::incremental_checkpoint;pub use incremental::incremental_document;pub use incremental::incremental_edit;pub use incremental::incremental_handler_v2;pub use incremental::incremental_integration;pub use incremental::incremental_simple;pub use incremental::incremental_v2;pub use workspace::workspace_refactor;pub use incremental_checkpoint::CheckpointedIncrementalParser;pub use incremental_checkpoint::SimpleEdit;pub use incremental::Edit;pub use incremental::IncrementalState;pub use incremental::apply_edits;pub use import_optimizer::DuplicateImport;pub use import_optimizer::ImportAnalysis;pub use import_optimizer::ImportEntry;pub use import_optimizer::ImportOptimizer;pub use import_optimizer::MissingImport;pub use import_optimizer::OrganizationSuggestion;pub use import_optimizer::SuggestionPriority;pub use import_optimizer::UnusedImport;pub use refactoring::ModernizationPattern;pub use refactoring::RefactoringConfig;pub use refactoring::RefactoringEngine;pub use refactoring::RefactoringOperation;pub use refactoring::RefactoringResult;pub use refactoring::RefactoringScope;pub use refactoring::RefactoringType;pub use engine::ast_v2;pub use engine::pragma_tracker;
Modules§
- analysis
- Semantic analysis, scope resolution, and type inference. Compatibility re-export of semantic analysis modules.
- ast
- Abstract Syntax Tree (AST) definitions for Perl parsing. Parser engine components and supporting utilities. Abstract Syntax Tree (AST) definitions for Perl parsing. AST facade for the core parser engine.
- ast_
utils - AST range and insertion helpers for Perl LSP features (previously
perl-ast-utils). AST utilities for Perl LSP microcrates. - builtin_
signatures - Builtin function signature lookup tables. Builtin function signatures and metadata. Comprehensive built-in function signatures for Perl scripting.
- builtin_
signatures_ phf - Perfect hash function (PHF) based builtin signature lookup. Builtin function signatures and metadata. Consolidated built-in function signatures for Perl using perfect hash
- builtins
- Perl builtin function signatures and metadata.
Re-exported builtin signature tables from
perl-parser-core. - dead_
code - Dead code detection for Perl workspaces (absorbed from
perl-dead-code). Dead code detection for Perl codebases (stub implementation) - declaration
- Variable and subroutine declaration analysis. Semantic analysis, symbol extraction, and type inference. Go-to-declaration support and parent map construction. Declaration Provider for LSP
- document_
store - In-memory document storage for open editor buffers. Workspace indexing and refactoring orchestration. Open-document storage used to overlay in-editor content over on-disk files. Document store for managing in-memory text content
- edit
- Edit tracking for incremental parsing.
Parser engine components and supporting utilities.
Edit tracking for incremental parsing (previously
perl-edit). Edit tracking for incremental parsing - engine
- Parser engine components and supporting utilities.
Re-exported parser engine modules from
perl-parser-core. - error
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Error types and recovery strategies for parser failures. Error types and recovery helpers for the parser engine.
- error_
classifier - Error classification and recovery strategies for parse failures. Error classification and diagnostic generation.
- error_
recovery - Error recovery strategies for resilient parsing. Error recovery strategies and traits for the Perl parser.
- heredoc_
anti_ patterns - Anti-pattern detection for problematic Perl heredoc patterns (previously
perl-heredoc-anti-patterns). Anti-pattern detection for heredoc edge cases - heredoc_
collector - Heredoc content collector with FIFO ordering and indent stripping.
Parser engine components and supporting utilities.
Heredoc collector and processor (previously
perl-heredoc). Heredoc collector and processor for Perl. - incremental
- Incremental parsing for efficient re-parsing during editing.
- index
- File and symbol indexing for workspace-wide navigation. Semantic analysis, symbol extraction, and type inference. Lightweight workspace symbol index. Cross-file workspace indexing for Perl symbols
- line_
index - Line-to-byte offset index for fast position lookups. Line indexing and position mapping utilities.
- parser
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Core parser implementation for Perl source. Recursive descent Perl parser.
- parser_
context - Parser context with error recovery support. Parser engine components and supporting utilities. Parser context with error recovery support. Parser context with error recovery support
- path_
normalize - Secure workspace-relative path normalization (previously
perl-path-normalize; from perl-parser-core). Secure workspace-relative path normalization (previouslyperl-path-normalize). Secure workspace-relative path normalization. - path_
security - Workspace-bound path validation and traversal prevention (previously
perl-path-security; from perl-parser-core). Workspace-bound path validation and traversal prevention (previouslyperl-path-security). Workspace-bound path validation and traversal prevention. - percentile
- Nearest-rank percentile helpers for integer latency samples (previously
perl-percentile; from perl-parser-core). Percentile helpers for integer metric samples (previouslyperl-percentile). Percentile helpers for integer metric samples (previouslyperl-percentile). - position
- Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Position tracking types and UTF-16 mapping utilities. Enhanced position tracking for incremental parsing
- qualified_
name - Perl qualified-name parsing, splitting, and validation helpers (previously
perl-qualified-name; from perl-parser-core). Perl qualified-name parsing, splitting, and validation helpers (previouslyperl-qualified-name). Focused helpers for Perl qualified-name parsing and validation (previouslyperl-qualified-name). - quote_
parser - Parser for Perl quote and quote-like operators.
Parser engine components and supporting utilities.
Quote operator parsing helpers (previously
perl-quote). Uniform quote operator parsing for the Perl parser. - refactor
- Code refactoring, modernization, and import optimization. Refactoring and modernization helpers.
- scope_
analyzer - Scope analysis for variable and subroutine resolution. Semantic analysis, symbol extraction, and type inference. Scope analysis for variable and subroutine resolution. Scope analysis and variable tracking for Perl parsing workflows
- semantic
- Semantic model with hover information and token classification. Semantic analysis, symbol extraction, and type inference. Semantic analyzer and token classification. Semantic analysis for IDE features.
- source_
file - Shared Perl source-file classification helpers (previously
perl-source-file; from perl-parser-core). Perl source-file classification helpers (previouslyperl-source-file). Shared Perl source-file classification helpers. - symbol
- Symbol table, extraction, and reference tracking. Semantic analysis, symbol extraction, and type inference. Symbol extraction and symbol table construction. Symbol extraction and symbol table for IDE features
- tdd
- Test-driven development support and test generation. Compatibility re-export of TDD support modules.
- tdd_
basic - Basic TDD utilities and test helpers. Test-driven development helpers and generators. Basic TDD workflow support for LSP
- test_
generator - Intelligent test case generation from parsed Perl code. Test-driven development helpers and generators. Test generator for TDD workflow support
- test_
runner - Test execution and TDD support functionality. Test-driven development helpers and generators. Test execution and TDD support functionality.
- text_
line - Text-line cursor and boundary helpers (previously
perl-text-line; from perl-parser-core). Text-line cursor and boundary helpers (previouslyperl-text-line). Text-line cursor helpers. - token_
stream - Token stream with position-aware iteration.
Token stream and trivia utilities for the parser.
Buffered token stream over the raw lexer (with trivia skipping).
Token stream adapter between
perl-lexeroutput and the parser. - token_
wrapper - Lightweight token wrapper for AST integration. Token stream and trivia utilities for the parser. Token wrapper with enhanced position tracking
- tokens
- Token stream, trivia, and token wrapper utilities.
Re-exported token stream utilities from
perl-parser-core. - trivia
- Trivia (whitespace and comments) representation. Token stream and trivia utilities for the parser. Trivia tokens (whitespace/comments/POD) used for formatting and diagnostics. Trivia (comments and whitespace) handling for the Perl parser
- trivia_
parser - Parser that preserves trivia tokens for formatting. Token stream and trivia utilities for the parser. Trivia-preserving parser helpers for formatting context. Trivia-preserving parser implementation
- type_
inference - Type inference engine for Perl variable analysis. Semantic analysis, symbol extraction, and type inference. Type inference engine for Perl variable analysis.
- util
- Parser utilities and helpers. Tokenization utilities shared by parser-facing entry points.
- workspace
- Workspace indexing, document store, and cross-file operations. Compatibility re-export of workspace indexing modules.
- workspace_
index - Cross-file symbol index for workspace-wide navigation. Workspace indexing and refactoring orchestration. Core workspace-wide symbol index and lookup/query API. Workspace-wide symbol index for fast cross-file lookups in Perl LSP.
- workspace_
rename - Cross-file symbol renaming with conflict detection. Workspace indexing and refactoring orchestration. Cross-file rename planning and edit-generation helpers. LSP feature module (deprecated)
Structs§
- Hover
Info - Semantic analysis types for hover, tokens, and code understanding. Hover information for symbols displayed in LSP hover requests.
- Node
- AST node, node kind enum, and source location types. Core AST node representing any Perl language construct within parsing workflows.
- Node
With Trivia - Trivia (whitespace/comments) attached to AST nodes. A node with attached trivia
- Parser
- Recursive descent Perl parser with error recovery and AST generation. Parser state for a single Perl source input.
- Position
Mapper - Line ending detection and UTF-16 position mapping for LSP compliance. Centralized position mapper using rope for efficiency.
- Pragma
State - Pragma state tracking for
use strict,use warnings, etc. Pragma state at a given point in the code - Pragma
Tracker - Pragma state tracking for
use strict,use warnings, etc. Tracks pragma state throughout a Perl file - Recovery
Salvage Profile - Parse error and result types for parser output. Per-file recovery/salvage summary.
- Scope
Analyzer - Scope analysis issue types and analyzer. Analyzes an AST for scope-related issues such as unused variables and shadowing.
- Scope
Issue - Scope analysis issue types and analyzer. A single scope-analysis finding with location and human-readable description.
- Semantic
Analyzer - Semantic analysis types for hover, tokens, and code understanding. Semantic analyzer providing comprehensive IDE features for Perl code.
- Semantic
Model - Semantic analysis types for hover, tokens, and code understanding. A stable, query-oriented view of semantic information over a parsed file.
- Semantic
Token - Semantic analysis types for hover, tokens, and code understanding. A semantic token with type and modifiers for LSP syntax highlighting.
- Symbol
- Symbol extraction, table, and reference types for navigation. A symbol definition in Perl code with comprehensive metadata for Index/Navigate workflows.
- Symbol
Extractor - Symbol extraction, table, and reference types for navigation. Extract symbols from an AST for Parse/Index workflows.
- Symbol
Reference - Symbol extraction, table, and reference types for navigation. A reference to a symbol with usage context for Navigate/Analyze workflows.
- Symbol
Table - Symbol extraction, table, and reference types for navigation. Comprehensive symbol table for Perl code analysis and LSP features in Index/Analyze stages.
- Token
- Token types and token stream for lexer output. Token produced by the lexer and consumed by the parser.
- Token
Stream - Token types and token stream for lexer output. Token stream that wraps perl-lexer or a pre-lexed token buffer.
- Trivia
Preserving Parser - Trivia-preserving parser and formatting utilities. Parser that preserves trivia
- Trivia
Token - Trivia (whitespace/comments) attached to AST nodes. A trivia token with position information
- Type
Based Completion - Type inference types: Perl types, constraints, and inference engine. Type-based code completion suggestions
- Type
Constraint - Type inference types: Perl types, constraints, and inference engine. Type constraint for type checking
- Type
Environment - Type inference types: Perl types, constraints, and inference engine. Type environment for tracking variable types
- Type
Inference Engine - Type inference types: Perl types, constraints, and inference engine. Main type inference engine
- Type
Location - Type inference types: Perl types, constraints, and inference engine. Location information for type errors
Enums§
- Issue
Kind - Scope analysis issue types and analyzer. Category of scope-related issue detected during analysis.
- Line
Ending - Line ending detection and UTF-16 position mapping for LSP compliance. Line ending style detected in a document
- Node
Kind - AST node, node kind enum, and source location types. Comprehensive enumeration of all Perl language constructs supported by the parser.
- Parse
Error - Parse error and result types for parser output. Comprehensive error types that can occur during Perl parsing workflows
- Perl
Type - Type inference types: Perl types, constraints, and inference engine. Represents a Perl type
- Recovery
Salvage Class - Parse error and result types for parser output. Closeout classification for a parsed file.
- Scalar
Type - Type inference types: Perl types, constraints, and inference engine. Represents specific scalar types in Perl
- Semantic
Token Modifier - Semantic analysis types for hover, tokens, and code understanding. Semantic token modifiers for Analyze/Complete stage highlighting.
- Semantic
Token Type - Semantic analysis types for hover, tokens, and code understanding. Semantic token types for syntax highlighting in the Parse/Complete workflow.
- Symbol
Kind - Symbol extraction, table, and reference types for navigation. Unified Perl symbol classification for LSP tooling.
- Token
Kind - Token types and token stream for lexer output. Token classification for Perl parsing.
- Trivia
- Trivia (whitespace/comments) attached to AST nodes. Trivia represents non-semantic tokens like comments and whitespace
Functions§
- format_
with_ trivia - Trivia-preserving parser and formatting utilities. Format an AST with trivia back to source code
Type Aliases§
- Parse
Result - Parse error and result types for parser output. Result type for parser operations in the Perl parsing workflow pipeline
- Source
Location - AST node, node kind enum, and source location types.
Type alias for backward compatibility with
SourceLocation.