Skip to main content

Crate perl_parser

Crate perl_parser 

Source
Expand description

§perl-parser — Production-grade Perl parser and Language Server Protocol engine

A comprehensive Perl parser built on recursive descent principles, providing robust AST generation, LSP feature providers, workspace indexing, and test-driven development support.

§Key Features

  • Tree-sitter Compatible: AST with kinds, fields, and position tracking compatible with tree-sitter grammar
  • Comprehensive Parsing: ~100% edge case coverage for Perl 5.8-5.40 syntax
  • LSP Integration: Full Language Server Protocol feature set (100% compliance, LSP 3.18)
  • TDD Workflow: Intelligent test generation with return value analysis
  • Incremental Parsing: Efficient re-parsing for real-time editing
  • Error Recovery: Graceful handling of malformed input with detailed diagnostics
  • Workspace Navigation: Cross-file symbol resolution and reference tracking

§Quick Start

§Basic Parsing

use perl_parser::Parser;

let code = r#"sub hello { print "Hello, world!\n"; }"#;
let mut parser = Parser::new(code);

match parser.parse() {
    Ok(ast) => {
        println!("AST: {}", ast.to_sexp());
        println!("Parsed {} nodes", ast.count_nodes());
    }
    Err(e) => eprintln!("Parse error: {}", e),
}

§Test-Driven Development

Generate tests automatically from parsed code:

use perl_parser::Parser;
use perl_parser::tdd::test_generator::{TestGenerator, TestFramework};

let code = r#"sub add { my ($a, $b) = @_; return $a + $b; }"#;
let mut parser = Parser::new(code);
let ast = parser.parse()?;

let generator = TestGenerator::new(TestFramework::TestMore);
let tests = generator.generate_tests(&ast, code);

// Returns test cases with intelligent assertions
assert!(!tests.is_empty());

§LSP Integration

Use as a library for LSP features (see perl-lsp for the standalone server):

use perl_parser::Parser;
use perl_parser::analysis::semantic::SemanticAnalyzer;

let code = "my $x = 42;";
let mut parser = Parser::new(code);
let ast = parser.parse()?;

// Semantic analysis for hover, completion, etc.
let model = SemanticAnalyzer::analyze(&ast);

§Architecture

The parser is organized into distinct layers for maintainability and testability:

§Core Engine (engine)

  • parser: Recursive descent parser with operator precedence
  • ast: Abstract Syntax Tree definitions and node types
  • error: Error classification, recovery strategies, and diagnostics
  • position: UTF-16 position mapping for LSP protocol compliance
  • quote_parser: Specialized parser for quote-like operators
  • heredoc_collector: FIFO heredoc collection with indent stripping

§IDE Integration (LSP Provider Crates)

LSP provider modules were removed from perl-parser as part of #4414 (microcrate collapse, PR #0). Import directly from the provider crates:

  • perl_lsp_completion — context-aware completion providers
  • perl_lsp_diagnostics — diagnostics generation and formatting
  • perl_lsp_navigation — references, document links, type definitions, workspace symbols
  • perl_lsp_rename — rename providers with validation
  • perl_lsp_semantic_tokens — semantic token generation
  • perl_lsp_inlay_hints — inlay hint providers
  • perl_lsp_code_actions — code action providers

§Analysis (analysis)

§Workspace (workspace)

§Refactoring (refactor)

§Test Support (tdd)

  • test_generator: Intelligent test case generation
  • test_runner: Test execution and validation
  • tdd_workflow (test-only): TDD cycle management and coverage tracking

§LSP Feature Support

This crate provides the engine for LSP features. The public standalone server is in perllsp, backed by the perl-lsp-rs implementation crate.

§Implemented Features

  • Completion: Context-aware code completion with type inference
  • Hover: Documentation and type information on hover
  • Definition: Go-to-definition with cross-file support
  • References: Find all references with workspace indexing
  • Rename: Symbol renaming with conflict detection
  • Diagnostics: Syntax errors and semantic warnings
  • Formatting: Code formatting via perltidy integration
  • Folding: Code folding for blocks and regions
  • Semantic Tokens: Fine-grained syntax highlighting
  • Call Hierarchy: Function call navigation
  • Type Hierarchy: Class inheritance navigation

See docs/reference/LSP_CAPABILITY_POLICY.md for the complete capability matrix.

§Incremental Parsing

Enable efficient re-parsing for real-time editing:

use perl_parser::{IncrementalState, apply_edits, Edit};

let mut state = IncrementalState::new("my $x = 1;");
let ast = state.parse()?;

// Apply an edit
let edit = Edit {
    start_byte: 3,
    old_end_byte: 5,
    new_end_byte: 5,
    text: "$y".to_string(),
};
apply_edits(&mut state, vec![edit]);

// Incremental re-parse reuses unchanged nodes
let new_ast = state.parse()?;

§Error Recovery

The parser uses intelligent error recovery to continue parsing after errors:

use perl_parser::Parser;

let code = "sub broken { if (";  // Incomplete code
let mut parser = Parser::new(code);

// Parser recovers and builds partial AST
let result = parser.parse();
assert!(result.is_ok());

// Check recorded errors
let errors = parser.errors();
assert!(!errors.is_empty());

§Workspace Indexing

Build cross-file indexes for workspace-wide navigation:

use perl_parser::workspace_index::WorkspaceIndex;

let mut index = WorkspaceIndex::new();
index.index_file("lib/Foo.pm", "package Foo; sub bar { }");
index.index_file("lib/Baz.pm", "use Foo; Foo::bar();");

// Find all references to Foo::bar
let refs = index.find_references("Foo::bar");

§Testing with perl-corpus

The parser is tested against the comprehensive perl-corpus test suite:

# Run parser tests with full corpus coverage
cargo test -p perl-parser

# Run specific test category
cargo test -p perl-parser --test regex_tests

# Validate documentation examples
cargo test --doc

§Command-Line Tools

Build and install the LSP server binary:

# Build LSP server
cargo build -p perllsp --release

# Install globally
cargo install --path crates/perllsp

# Run LSP server
perllsp --stdio

# Check server health
perllsp --health

§Integration Examples

§VSCode Extension

Configure the LSP server in VSCode settings:

{
  "perl.lsp.path": "/path/to/perllsp",
  "perl.lsp.args": ["--stdio"]
}

§Neovim Integration

require'lspconfig'.perl.setup{
  cmd = { "/path/to/perllsp", "--stdio" },
}

§Performance Characteristics

  • Single-pass parsing: O(n) complexity for well-formed input
  • UTF-16 mapping: Fast bidirectional offset conversion for LSP
  • Incremental updates: Reuses unchanged AST nodes for efficiency
  • Memory efficiency: Streaming token processing with bounded lookahead

§Compatibility

  • Perl Versions: 5.8 through 5.40 (covers 99% of CPAN)
  • LSP Protocol: LSP 3.18 specification
  • Tree-sitter: Compatible AST format and position tracking
  • UTF-16: Full Unicode support with correct LSP position mapping
  • perllsp: Public Cargo entry point for the standalone LSP server
  • perl-lsp-rs: Standalone LSP server runtime implementation (moved from this crate)
  • perl-lexer: Context-aware Perl tokenizer
  • perl-corpus: Comprehensive test corpus and generators
  • perl-dap: Debug Adapter Protocol implementation

§Documentation

  • API Docs: See module documentation below
  • LSP Guide: docs/reference/LSP_IMPLEMENTATION_GUIDE.md
  • Capability Policy: docs/reference/LSP_CAPABILITY_POLICY.md
  • Commands: docs/reference/COMMANDS_REFERENCE.md
  • Current Status: docs/project/CURRENT_STATUS.md

§Architecture

The parser follows a recursive descent design with operator precedence handling, maintaining a clean separation from the lexing phase. This modular approach enables:

  • Independent testing of parsing logic
  • Easy integration with different lexer implementations
  • Clear error boundaries between lexing and parsing phases
  • Optimal performance through single-pass parsing

§Example

use perl_parser::Parser;

let code = "my $x = 42;";
let mut parser = Parser::new(code);

match parser.parse() {
    Ok(ast) => println!("AST: {}", ast.to_sexp()),
    Err(e) => eprintln!("Parse error: {}", e),
}

Re-exports§

pub use dead_code as dead_code_detector;
pub use refactor::import_optimizer;
pub use refactor::modernize;
pub use refactor::modernize_refactored;
pub use refactor::refactoring;
pub use incremental::incremental_advanced_reuse;
pub use incremental::incremental_checkpoint;
pub use incremental::incremental_document;
pub use incremental::incremental_edit;
pub use incremental::incremental_handler_v2;
pub use incremental::incremental_integration;
pub use incremental::incremental_simple;
pub use incremental::incremental_v2;
pub use workspace::workspace_refactor;
pub use incremental_checkpoint::CheckpointedIncrementalParser;
pub use incremental_checkpoint::SimpleEdit;
pub use incremental::Edit;
pub use incremental::IncrementalState;
pub use incremental::apply_edits;
pub use import_optimizer::DuplicateImport;
pub use import_optimizer::ImportAnalysis;
pub use import_optimizer::ImportEntry;
pub use import_optimizer::ImportOptimizer;
pub use import_optimizer::MissingImport;
pub use import_optimizer::OrganizationSuggestion;
pub use import_optimizer::SuggestionPriority;
pub use import_optimizer::UnusedImport;
pub use refactoring::ModernizationPattern;
pub use refactoring::RefactoringConfig;
pub use refactoring::RefactoringEngine;
pub use refactoring::RefactoringOperation;
pub use refactoring::RefactoringResult;
pub use refactoring::RefactoringScope;
pub use refactoring::RefactoringType;
pub use engine::ast_v2;
pub use engine::pragma_tracker;

Modules§

analysis
Semantic analysis, scope resolution, and type inference. Compatibility re-export of semantic analysis modules.
ast
Abstract Syntax Tree (AST) definitions for Perl parsing. Parser engine components and supporting utilities. Abstract Syntax Tree (AST) definitions for Perl parsing. AST facade for the core parser engine.
ast_utils
AST range and insertion helpers for Perl LSP features (previously perl-ast-utils). AST utilities for Perl LSP microcrates.
builtin_signatures
Builtin function signature lookup tables. Builtin function signatures and metadata. Comprehensive built-in function signatures for Perl scripting.
builtin_signatures_phf
Perfect hash function (PHF) based builtin signature lookup. Builtin function signatures and metadata. Consolidated built-in function signatures for Perl using perfect hash
builtins
Perl builtin function signatures and metadata. Re-exported builtin signature tables from perl-parser-core.
dead_code
Dead code detection for Perl workspaces (absorbed from perl-dead-code). Dead code detection for Perl codebases (stub implementation)
declaration
Variable and subroutine declaration analysis. Semantic analysis, symbol extraction, and type inference. Go-to-declaration support and parent map construction. Declaration Provider for LSP
document_store
In-memory document storage for open editor buffers. Workspace indexing and refactoring orchestration. Open-document storage used to overlay in-editor content over on-disk files. Document store for managing in-memory text content
edit
Edit tracking for incremental parsing. Parser engine components and supporting utilities. Edit tracking for incremental parsing (previously perl-edit). Edit tracking for incremental parsing
engine
Parser engine components and supporting utilities. Re-exported parser engine modules from perl-parser-core.
error
Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Error types and recovery strategies for parser failures. Error types and recovery helpers for the parser engine.
error_classifier
Error classification and recovery strategies for parse failures. Error classification and diagnostic generation.
error_recovery
Error recovery strategies for resilient parsing. Error recovery strategies and traits for the Perl parser.
heredoc_anti_patterns
Anti-pattern detection for problematic Perl heredoc patterns (previously perl-heredoc-anti-patterns). Anti-pattern detection for heredoc edge cases
heredoc_collector
Heredoc content collector with FIFO ordering and indent stripping. Parser engine components and supporting utilities. Heredoc collector and processor (previously perl-heredoc). Heredoc collector and processor for Perl.
incremental
Incremental parsing for efficient re-parsing during editing.
index
File and symbol indexing for workspace-wide navigation. Semantic analysis, symbol extraction, and type inference. Lightweight workspace symbol index. Cross-file workspace indexing for Perl symbols
line_index
Line-to-byte offset index for fast position lookups. Line indexing and position mapping utilities.
parser
Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Core parser implementation for Perl source. Recursive descent Perl parser.
parser_context
Parser context with error recovery support. Parser engine components and supporting utilities. Parser context with error recovery support. Parser context with error recovery support
path_normalize
Secure workspace-relative path normalization (previously perl-path-normalize; from perl-parser-core). Secure workspace-relative path normalization (previously perl-path-normalize). Secure workspace-relative path normalization.
path_security
Workspace-bound path validation and traversal prevention (previously perl-path-security; from perl-parser-core). Workspace-bound path validation and traversal prevention (previously perl-path-security). Workspace-bound path validation and traversal prevention.
percentile
Nearest-rank percentile helpers for integer latency samples (previously perl-percentile; from perl-parser-core). Percentile helpers for integer metric samples (previously perl-percentile). Percentile helpers for integer metric samples (previously perl-percentile).
position
Legacy module aliases for moved engine components. Parser engine components and supporting utilities. Position tracking types and UTF-16 mapping utilities. Enhanced position tracking for incremental parsing
qualified_name
Perl qualified-name parsing, splitting, and validation helpers (previously perl-qualified-name; from perl-parser-core). Perl qualified-name parsing, splitting, and validation helpers (previously perl-qualified-name). Focused helpers for Perl qualified-name parsing and validation (previously perl-qualified-name).
quote_parser
Parser for Perl quote and quote-like operators. Parser engine components and supporting utilities. Quote operator parsing helpers (previously perl-quote). Uniform quote operator parsing for the Perl parser.
refactor
Code refactoring, modernization, and import optimization. Refactoring and modernization helpers.
scope_analyzer
Scope analysis for variable and subroutine resolution. Semantic analysis, symbol extraction, and type inference. Scope analysis for variable and subroutine resolution. Scope analysis and variable tracking for Perl parsing workflows
semantic
Semantic model with hover information and token classification. Semantic analysis, symbol extraction, and type inference. Semantic analyzer and token classification. Semantic analysis for IDE features.
source_file
Shared Perl source-file classification helpers (previously perl-source-file; from perl-parser-core). Perl source-file classification helpers (previously perl-source-file). Shared Perl source-file classification helpers.
symbol
Symbol table, extraction, and reference tracking. Semantic analysis, symbol extraction, and type inference. Symbol extraction and symbol table construction. Symbol extraction and symbol table for IDE features
tdd
Test-driven development support and test generation. Compatibility re-export of TDD support modules.
tdd_basic
Basic TDD utilities and test helpers. Test-driven development helpers and generators. Basic TDD workflow support for LSP
test_generator
Intelligent test case generation from parsed Perl code. Test-driven development helpers and generators. Test generator for TDD workflow support
test_runner
Test execution and TDD support functionality. Test-driven development helpers and generators. Test execution and TDD support functionality.
text_line
Text-line cursor and boundary helpers (previously perl-text-line; from perl-parser-core). Text-line cursor and boundary helpers (previously perl-text-line). Text-line cursor helpers.
token_stream
Token stream with position-aware iteration. Token stream and trivia utilities for the parser. Buffered token stream over the raw lexer (with trivia skipping). Token stream adapter between perl-lexer output and the parser.
token_wrapper
Lightweight token wrapper for AST integration. Token stream and trivia utilities for the parser. Token wrapper with enhanced position tracking
tokens
Token stream, trivia, and token wrapper utilities. Re-exported token stream utilities from perl-parser-core.
trivia
Trivia (whitespace and comments) representation. Token stream and trivia utilities for the parser. Trivia tokens (whitespace/comments/POD) used for formatting and diagnostics. Trivia (comments and whitespace) handling for the Perl parser
trivia_parser
Parser that preserves trivia tokens for formatting. Token stream and trivia utilities for the parser. Trivia-preserving parser helpers for formatting context. Trivia-preserving parser implementation
type_inference
Type inference engine for Perl variable analysis. Semantic analysis, symbol extraction, and type inference. Type inference engine for Perl variable analysis.
util
Parser utilities and helpers. Tokenization utilities shared by parser-facing entry points.
workspace
Workspace indexing, document store, and cross-file operations. Compatibility re-export of workspace indexing modules.
workspace_index
Cross-file symbol index for workspace-wide navigation. Workspace indexing and refactoring orchestration. Core workspace-wide symbol index and lookup/query API. Workspace-wide symbol index for fast cross-file lookups in Perl LSP.
workspace_rename
Cross-file symbol renaming with conflict detection. Workspace indexing and refactoring orchestration. Cross-file rename planning and edit-generation helpers. LSP feature module (deprecated)

Structs§

HoverInfo
Semantic analysis types for hover, tokens, and code understanding. Hover information for symbols displayed in LSP hover requests.
Node
AST node, node kind enum, and source location types. Core AST node representing any Perl language construct within parsing workflows.
NodeWithTrivia
Trivia (whitespace/comments) attached to AST nodes. A node with attached trivia
Parser
Recursive descent Perl parser with error recovery and AST generation. Parser state for a single Perl source input.
PositionMapper
Line ending detection and UTF-16 position mapping for LSP compliance. Centralized position mapper using rope for efficiency.
PragmaState
Pragma state tracking for use strict, use warnings, etc. Pragma state at a given point in the code
PragmaTracker
Pragma state tracking for use strict, use warnings, etc. Tracks pragma state throughout a Perl file
RecoverySalvageProfile
Parse error and result types for parser output. Per-file recovery/salvage summary.
ScopeAnalyzer
Scope analysis issue types and analyzer. Analyzes an AST for scope-related issues such as unused variables and shadowing.
ScopeIssue
Scope analysis issue types and analyzer. A single scope-analysis finding with location and human-readable description.
SemanticAnalyzer
Semantic analysis types for hover, tokens, and code understanding. Semantic analyzer providing comprehensive IDE features for Perl code.
SemanticModel
Semantic analysis types for hover, tokens, and code understanding. A stable, query-oriented view of semantic information over a parsed file.
SemanticToken
Semantic analysis types for hover, tokens, and code understanding. A semantic token with type and modifiers for LSP syntax highlighting.
Symbol
Symbol extraction, table, and reference types for navigation. A symbol definition in Perl code with comprehensive metadata for Index/Navigate workflows.
SymbolExtractor
Symbol extraction, table, and reference types for navigation. Extract symbols from an AST for Parse/Index workflows.
SymbolReference
Symbol extraction, table, and reference types for navigation. A reference to a symbol with usage context for Navigate/Analyze workflows.
SymbolTable
Symbol extraction, table, and reference types for navigation. Comprehensive symbol table for Perl code analysis and LSP features in Index/Analyze stages.
Token
Token types and token stream for lexer output. Token produced by the lexer and consumed by the parser.
TokenStream
Token types and token stream for lexer output. Token stream that wraps perl-lexer or a pre-lexed token buffer.
TriviaPreservingParser
Trivia-preserving parser and formatting utilities. Parser that preserves trivia
TriviaToken
Trivia (whitespace/comments) attached to AST nodes. A trivia token with position information
TypeBasedCompletion
Type inference types: Perl types, constraints, and inference engine. Type-based code completion suggestions
TypeConstraint
Type inference types: Perl types, constraints, and inference engine. Type constraint for type checking
TypeEnvironment
Type inference types: Perl types, constraints, and inference engine. Type environment for tracking variable types
TypeInferenceEngine
Type inference types: Perl types, constraints, and inference engine. Main type inference engine
TypeLocation
Type inference types: Perl types, constraints, and inference engine. Location information for type errors

Enums§

IssueKind
Scope analysis issue types and analyzer. Category of scope-related issue detected during analysis.
LineEnding
Line ending detection and UTF-16 position mapping for LSP compliance. Line ending style detected in a document
NodeKind
AST node, node kind enum, and source location types. Comprehensive enumeration of all Perl language constructs supported by the parser.
ParseError
Parse error and result types for parser output. Comprehensive error types that can occur during Perl parsing workflows
PerlType
Type inference types: Perl types, constraints, and inference engine. Represents a Perl type
RecoverySalvageClass
Parse error and result types for parser output. Closeout classification for a parsed file.
ScalarType
Type inference types: Perl types, constraints, and inference engine. Represents specific scalar types in Perl
SemanticTokenModifier
Semantic analysis types for hover, tokens, and code understanding. Semantic token modifiers for Analyze/Complete stage highlighting.
SemanticTokenType
Semantic analysis types for hover, tokens, and code understanding. Semantic token types for syntax highlighting in the Parse/Complete workflow.
SymbolKind
Symbol extraction, table, and reference types for navigation. Unified Perl symbol classification for LSP tooling.
TokenKind
Token types and token stream for lexer output. Token classification for Perl parsing.
Trivia
Trivia (whitespace/comments) attached to AST nodes. Trivia represents non-semantic tokens like comments and whitespace

Functions§

format_with_trivia
Trivia-preserving parser and formatting utilities. Format an AST with trivia back to source code

Type Aliases§

ParseResult
Parse error and result types for parser output. Result type for parser operations in the Perl parsing workflow pipeline
SourceLocation
AST node, node kind enum, and source location types. Type alias for backward compatibility with SourceLocation.