Skip to main content

Crate perl_corpus

Crate perl_corpus 

Source
Expand description

Comprehensive Perl test corpus and property-based testing infrastructure

This crate provides a curated collection of Perl code samples for testing parser correctness, edge case coverage, and LSP feature validation. It includes both manually curated test cases and property-based test generators for comprehensive coverage.

§Architecture

The corpus is organized into several layers:

  • Curated Test Cases: Hand-written examples covering Perl syntax edge cases
  • Property-Based Generators: Randomized code generation for fuzz testing
  • Real-World Samples: Code from CPAN and production Perl projects
  • Metadata System: Tag-based organization with section markers and test IDs

§Corpus Organization

Test cases are stored in text files with section markers and metadata:

==========================================
Basic Variable Declaration
==========================================
# @id: vars.basic.my
# @tags: variables, declaration
my $x = 42;
---
(expected AST representation)

Each section includes:

  • Title: Human-readable test case name
  • Metadata: ID, tags, Perl version requirements, flags
  • Body: Perl code to parse
  • Expected Output: Optional AST or error expectations (after ---)

§Usage

§Loading Corpus Files

use perl_corpus::{CorpusPaths, get_corpus_files};

let files = get_corpus_files();

for file in files {
    println!("Found corpus file: {:?}", file.path);
}

§Parsing Corpus Sections

use perl_corpus::parse_file;
use std::path::Path;

let sections = parse_file(path)?;

for section in sections {
    println!("Section: {} (id: {})", section.title, section.id);
    println!("Tags: {:?}", section.tags);
    println!("Code:\n{}", section.body);
}

§Finding Cases by Tag

use perl_corpus::{parse_dir, find_by_tag};
use std::path::Path;

let all_sections = parse_dir(corpus_dir)?;
let regex_tests = find_by_tag(&all_sections, "regex");

println!("Found {} regex test cases", regex_tests.len());

§Using Property-Based Generators

use perl_corpus::{generate_perl_code_with_seed, CodegenOptions};

// Generate random valid Perl code
let code = generate_perl_code_with_seed(10, 42);
println!("Generated:\n{}", code);

// Generate with specific options
let options = CodegenOptions::default();
let modern_code = generate_perl_code(&options);

§Specialized Test Case Modules

The corpus includes focused generators for specific Perl features:

§Complex Data Structures

use perl_corpus::{complex_data_structure_cases, find_complex_case};

let cases = complex_data_structure_cases();
if let Some(nested) = find_complex_case("nested-arrays") {
    println!("Test: {}", nested.description);
    println!("Code:\n{}", nested.code);
}

§Continue/Redo Blocks

use perl_corpus::{continue_redo_cases, valid_continue_redo_cases};

let all_cases = continue_redo_cases();
let valid_only = valid_continue_redo_cases();

§Format Statements

use perl_corpus::{format_statement_cases, FormatStatementGenerator};

let cases = format_statement_cases();
let generator = FormatStatementGenerator::new(42);

§Glob Expressions

use perl_corpus::{glob_expression_cases, GlobExpressionGenerator};

let cases = glob_expression_cases();
let generator = GlobExpressionGenerator::new(42);

§Tie Interface

use perl_corpus::{tie_interface_cases, tie_cases_by_tag};

let all_tie = tie_interface_cases();
let scalar_tie = tie_cases_by_tag("scalar");

§Corpus Layers

The corpus is organized into three layers accessible via CorpusLayer:

  • CorpusLayer::Main: Core test cases in test_corpus/
  • CorpusLayer::TreeSitter: Tree-sitter grammar tests in tree-sitter-perl/test/corpus/
  • CorpusLayer::Fuzz: Fuzzing inputs and edge cases in crates/perl-corpus/fuzz/

§Environment Configuration

Override the corpus root with the CORPUS_ROOT environment variable:

export CORPUS_ROOT=/path/to/custom/corpus
cargo test

§Integration with Parser Testing

The corpus integrates with perl-parser test suites:

use perl_parser::Parser;
use perl_corpus::{parse_dir, find_by_tag};

let sections = parse_dir(corpus_dir)?;
let regex_cases = find_by_tag(&sections, "regex");

for case in regex_cases {
    let mut parser = Parser::new(&case.body);
    let result = parser.parse();
    assert!(result.is_ok(), "Failed to parse: {}", case.title);
}

§Test Case Validation

Corpus files can include validation flags:

  • parser-sensitive: Requires specific parser version
  • perl-version:5.26: Requires Perl 5.26+ features
  • expected-error: Test case should produce parse error
  • wip: Work in progress, may not parse correctly yet

§Contributing Test Cases

To add new test cases:

  1. Create or edit a corpus file in test_corpus/
  2. Use section markers (====) to separate cases
  3. Add metadata tags for categorization
  4. Include expected output after --- separator
  5. Run cargo test to validate

See existing corpus files for examples and conventions.

Re-exports§

pub use api::*;

Modules§

api
cases
Static edge case fixtures and complex data structure samples.
codegen
Randomized Perl code generation utilities.
concepts
continue_redo
Continue and redo loop control statement test fixtures.
files
Corpus file discovery helpers.
fixture_expectations
format_statements
Format statement test fixtures for Perl LSP corpus.
gen
glob_expressions
Glob expression test fixtures for file pattern matching and diamond operator.
gold
index
inventory
lint
loading
meta
metadata
prelude
sidecar
tie_interface
Tie/untie interface corpus - comprehensive test fixtures for Perl’s tie mechanism.