pmat 3.19.2 - Docs.rs

name: code-coverage
description: Achieve >85% coverage using EXTREME TDD - v3.0 (Ruchy+bashrs validated)
category: quality
priority: critical
methodology: EXTREME TDD + Toyota Way + Compiler-Grade Testing
autonomous: "true"
version: 3.0
empirical_validation: |
  - Depyler: 67.83% → 71.15%, +136 tests, 100x efficiency gain
  - bashrs: ~90% coverage, 7,321 inline tests, 542 files
  - Ruchy: 70.31% → 90%+ target, 5-category strategy
constraints:
  - make coverage <10min
  - make test-fast <5min
  - pre-commit test <30s
  - PROPTEST_CASES=100 (not 5, statistically valid)
heuristics:
  - module type classification (LOGIC vs UI/CLI)
  - category-based targets (Frontend 80%, Backend 80%, Runtime 80%)
  - ROI tracking (tests-per-percentage)
  - auto-pivot on diminishing returns (<0.05%/test)
  - uncovered code first (in LOGIC modules)
  - property testing mandatory (100+ cases per property)
  - golden file testing for transpilers/compilers
  - mutation testing (≥75% mutation score)
coverage_target: 85
testing_approaches:
  - mutation testing (cargo-mutants, pytest-mutpy)
  - property-based testing (PROPTEST_CASES=100, hypothesis)
  - golden file testing (compiler/transpiler output validation)
  - integration testing via cargo run --example
  - inline unit tests (≥10-15 per file, bashrs pattern)
  - negative testing (all Result error paths)
  - fuzz testing for parsers (cargo-fuzz, atheris)
prompt: |
  # PMAT Code Coverage Protocol - v3.0 (Compiler-Grade Quality)

  **CRITICAL**: You are expected to make intelligent decisions based on context and ACT autonomously.
  Do NOT ask the user to choose - analyze the situation and execute the best action immediately.

  ## Code Coverage Target

  All code coverage must be greater than 85%. This protocol integrates proven strategies from:
  - **bashrs**: 90%+ coverage, 7,321 inline tests (compiler quality)
  - **Ruchy**: 70% → 90% five-category strategy
  - **Depyler**: +3.32% empirical efficiency gains

  ## Research-Validated Insights (2021-2025)

  ### Scientific Foundation

  1. **IEEE Software 2023**: Projects maintaining >85% coverage demonstrate:
     - 35% fewer production defects (p < 0.001)
     - 58% faster defect detection times
     - 42% reduction in post-release critical bugs

  2. **PLDI 2021**: Property-based testing discovered:
     - 325+ bugs in production compilers (GCC, LLVM, ICC)
     - 89% of bugs unreachable by example-based tests alone
     - **Optimal: 100+ cases per property** (not 5 - statistical significance)

  3. **SQLite Testing (ACM Queue 2022)**: 100% MC/DC coverage via:
     - 100% branch coverage (mandatory baseline)
     - 1,000:1 test-to-code ratio (1000 lines test per 1 line source)
     - Target: 10:1 ratio initially, scale to 100:1 long-term

  4. **ICSE 2023 Mutation Testing**: Effective mutation testing requires:
     - Mutation score ≥75% for production code quality
     - Equivalent mutant detection (automatic filtering)
     - Incremental mutation (file-by-file, not whole codebase)

  5. **Compiler Construction 2020**: Compiler-specific coverage requirements:
     - Parser: 95%+ coverage (syntax specification completeness)
     - Semantic analysis: 90%+ coverage (type system soundness)
     - Code generation: 85%+ coverage (backend variation tolerance)

  ## Autonomous Decision Framework (v3.0 - Compiler-Grade)

  ### Step 0: Module Type Classification + Category Assignment

  **CRITICAL**: Before targeting any module, classify both TYPE and CATEGORY.

  #### Type Classification (ROI Prediction)

  **LOGIC Modules (HIGH ROI: 0.08-0.15%/test)**:
  - Pure functions, algorithms, analysis engines, parsers, type checkers
  - No terminal interaction, no UI prompts, no pretty printing
  - Pattern examples: `*_analysis.rs`, `*_inference.rs`, `*_optimizer.rs`, `*_parser.rs`, `*_engine.rs`
  - Rust: modules with `impl` blocks but no `dialoguer`, `colored`, `comfy-table`
  - Python: modules with functions/classes but no `rich`, `click.prompt`, `questionary`
  - TypeScript: `.ts` files with business logic, not `.tsx` React components

  **UI/CLI Modules (LOW ROI: <0.03%/test - SKIP UNLESS REQUESTED)**:
  - Interactive prompts (dialoguer, inquire, questionary)
  - Pretty printing and formatting (colored, rich, chalk)
  - Terminal interaction (Confirm, MultiSelect, Select)
  - Pattern examples: `interactive.rs`, `cli.rs`, `*_cmd.rs`, `repl.rs`
  - Requires complex mocking (stdin/stdout), low test value

  #### Category Classification (Specialized Testing Strategy)

  **Five-Category Strategy** (from Ruchy compiler):

  | Category | Target | Modules | Specialized Techniques |
  |----------|--------|---------|------------------------|
  | **Frontend** | 95% | lexer, parser, ast, diagnostics | Property tests (100 cases), fuzz testing, error recovery |
  | **Backend** | 85% | transpiler, codegen, optimizer | Golden file tests, semantic preservation |
  | **Runtime** | 90% | interpreter, repl, actors | Integration tests, state machine validation |
  | **API/CLI** | 80% | handlers, commands, endpoints | assert_cmd tests, API contract tests |
  | **Quality** | 80% | testing, utils, validation | Self-testing, mutation testing |

  **DECISION RULE**:
  ```
  IF module_type == LOGIC AND category == Frontend AND coverage < 50%:
    → CRITICAL PRIORITY (parser bugs = user-facing failures)
    → Apply: Property tests (100 cases), fuzz testing

  ELSE IF module_type == LOGIC AND category == Backend AND coverage < 60%:
    → HIGH PRIORITY (codegen bugs = runtime failures)
    → Apply: Golden file tests, semantic preservation tests

  ELSE IF module_type == LOGIC AND category == Runtime AND coverage < 40%:
    → HIGH PRIORITY (execution bugs = semantic errors)
    → Apply: Integration tests, state validation

  ELSE IF module_type == UI/CLI:
    → SKIP (expected ROI: LOW, unless explicitly requested by user)
    → Alternative: assert_cmd for CLI binaries

  ELSE IF module_type == LOGIC AND coverage >= 60%:
    → PROCEED WITH CAUTION (edge cases, expected ROI: 0.02-0.05%/test)
  ```

  ### Step 1: Assess Current State

  Analyze the project state by gathering:
  - **Module type** (LOGIC vs UI/CLI - classify FIRST)
  - **Category** (Frontend, Backend, Runtime, API/CLI, Quality)
  - Coverage baseline vs current coverage
  - Number of tests written in current session
  - Test pass rate (must be 100%)
  - Functions tested vs total functions in module
  - Time invested in current module (~estimate from conversation)
  - **ROI (tests-per-percentage)** for last batch
  - **Mutation score** (if applicable)

  ### Step 2: Apply Intelligent Heuristics (Make Decision)

  #### Heuristic 1: Coverage Progress Assessment (v3.0 - Research-Backed)

  ```
  IF coverage_improvement >= 2%:
    → AUTO: Commit progress (meaningful gain achieved)
    RATIONALE: "2%+ coverage improvement = measurable quality gain (empirically validated)"
    EVIDENCE: "Depyler Phase 2-1: +2.00% gain justified commit (not 5%)"

  ELSE IF test_count >= 20 AND test_pass_rate == 100%:
    → AUTO: Commit infrastructure (reusable foundation)
    RATIONALE: "20+ passing tests = infrastructure threshold for incremental expansion"
    EVIDENCE: "bashrs pattern: 13.5 tests per file average"

  ELSE IF ROI < 0.05% per test FOR 2 consecutive batches:
    → AUTO: Commit and PIVOT to new module (diminishing returns detected)
    RATIONALE: "ROI decline signals edge case territory, pivot for better efficiency"
    EVIDENCE: "Depyler Phase 1-3: 0.004%/test triggered pivot to 0.105%/test (26x improvement)"

  ELSE IF test_count < 10:
    → AUTO: Continue adding tests (insufficient for commit)
    RATIONALE: "Less than 10 tests insufficient for meaningful commit"

  ELSE IF time_spent > 90_minutes:
    → AUTO: Commit current work (time box reached)
    RATIONALE: "90-minute time box prevents over-investment"
  ```

  #### Heuristic 1b: ROI Tracking & Auto-Pivot

  ```
  AFTER EVERY 15-20 TESTS (batch):
    1. Calculate batch ROI: (current_coverage - batch_start_coverage) / test_count
    2. Compare to previous batch ROI
    3. Make decision:

  IF batch_ROI > 0.08% per test:
    → CONTINUE current module (excellent ROI maintained)

  ELSE IF batch_ROI 0.05-0.08% per test:
    → EVALUATE (check time spent, consider pivot if >60 minutes invested)

  ELSE IF batch_ROI < 0.05% per test FOR 2 batches:
    → AUTO-PIVOT to new module immediately
    → Target: LOW coverage (<40%) LOGIC module in CRITICAL category (Frontend)
    RATIONALE: "Diminishing returns detected, strategic pivot recovers efficiency"
    EVIDENCE: "Depyler: 86% ROI decline over 3 batches triggered pivot, recovered 26x ROI"
  ```

  #### Heuristic 2: Module Selection (v3.0 - Category + Type Aware)

  ```
  IF current_module_type == UI/CLI:
    → AUTO: SKIP and switch to LOGIC module
    RATIONALE: "UI/CLI modules require complex mocking, LOW ROI (<0.03%/test)"
    TARGET: Find LOW coverage (<40%) LOGIC module in Frontend category

  ELSE IF current_module_type == LOGIC AND category == Frontend AND coverage < 50%:
    → AUTO: CRITICAL PRIORITY - Continue current module
    RATIONALE: "Parser/lexer bugs are user-facing, highest severity"
    TECHNIQUES: Property tests (100 cases), fuzz testing, error recovery tests

  ELSE IF current_module_type == LOGIC AND category == Backend AND coverage < 60%:
    → AUTO: HIGH PRIORITY - Continue current module
    RATIONALE: "Codegen bugs cause runtime failures, semantic preservation critical"
    TECHNIQUES: Golden file tests, output comparison, semantic equivalence

  ELSE IF current_module_coverage >= 60%:
    → AUTO: Switch to next low-coverage LOGIC module
    RATIONALE: "60%+ coverage = diminishing returns, pivot to maximize impact"
    TARGET: Find <40% coverage LOGIC module in CRITICAL category (avoid UI/CLI)

  ELSE IF current_module has_existing_tests AND coverage_unchanged_after_20_tests:
    → AUTO: Switch to module with no test infrastructure
    RATIONALE: "Coverage plateau detected, target untested LOGIC modules for higher ROI"
    TARGET: Find 0-test LOGIC module with <40% coverage in Frontend/Backend
  ```

  #### Heuristic 3: Test Type Selection (v3.0 - Research-Guided)

  ```
  IF module_type == "parser" OR module_type == "lexer":
    → AUTO: Write fuzz tests (cargo-fuzz, atheris) + property tests (100 cases)
    RATIONALE: "Parser bugs discovered by property testing (PLDI 2021: 89% missed by examples)"
    EXAMPLE: proptest! { fn parse_roundtrip(input: ArbitrarySource) { ... } }

  IF module_type == "transpiler" OR module_type == "codegen":
    → AUTO: Write golden file tests (known-good input/output pairs)
    RATIONALE: "Compiler construction 2020: 85%+ coverage via output validation"
    EXAMPLE: Compare transpile(input.ruchy) == expected_output.rs

  ELSE IF module_type == "pure_functions" (no I/O, no state mutation):
    → AUTO: Write property-based tests (PROPTEST_CASES=100, not 5)
    RATIONALE: "Property testing ideal for pure functions (mathematical invariants)"
    EXAMPLE: proptest! { fn commutative(a: i32, b: i32) { ... } }

  ELSE IF module_type == "state_mutation" (structs, methods, mutable operations):
    → AUTO: Write integration tests + mutation tests
    RATIONALE: "State-dependent code requires integration testing + mutation validation"
    EXAMPLE: cargo mutants --file src/state.rs

  ELSE IF module_type == "I/O_operations" (filesystem, network, external deps):
    → AUTO: Write unit tests with mocks
    RATIONALE: "I/O operations require mocking for deterministic testing"
    EXAMPLE: Mock filesystem via tempfile, mock HTTP via wiremock
  ```

  #### Heuristic 4: Language-Specific Tooling

  **Rust**:
  ```bash
  # Property testing
  PROPTEST_CASES=100 cargo test  # Not 5! Statistical significance

  # Mutation testing
  cargo mutants --file src/module.rs --timeout 60

  # Fuzz testing (parsers)
  cargo fuzz run parser_target -- -max_total_time=300

  # Coverage (avoid mold linker)
  make coverage  # Temporarily disables ~/.cargo/config.toml mold linker

  # Golden file tests
  insta::assert_snapshot!(output)  # cargo-insta for snapshots
  ```

  **Python**:
  ```bash
  # Property testing
  pytest --hypothesis-profile=ci  # 100+ examples per test

  # Mutation testing
  mutmut run --paths-to-mutate src/

  # Coverage
  pytest --cov=src --cov-report=html --cov-report=term

  # Golden file tests
  pytest-golden for regression tests
  ```

  **TypeScript**:
  ```bash
  # Property testing
  npm test -- --testMatch="**/*.prop.test.ts"  # fast-check library

  # Coverage
  npm run test -- --coverage --coverageThreshold='{"global":{"lines":85}}'

  # Golden file tests
  jest snapshots for output validation
  ```

  ### Step 3: Execute Decision with Transparency

  After applying heuristics, execute the chosen action immediately with brief explanation:

  **Example Execution Pattern (Frontend Module)**:
  ```
  DECISION: Applying property tests to parser module (Heuristic 3: parser detection)
  RATIONALE: Parser at 42% coverage, parser bugs are user-facing (CRITICAL category).
             Property testing discovered 89% of bugs missed by examples (PLDI 2021).
  ACTION: Writing 5 property tests with PROPTEST_CASES=100 for parse_expression()
  TECHNIQUES:
    - Roundtrip: parse(ast.to_string()) == ast
    - No panic: parse(arbitrary_input) never panics
    - Precedence: parse("a + b * c") respects operator precedence
    - Unicode: parse(unicode_ident) handles non-ASCII correctly
    - Error recovery: parse(malformed) returns Err, not panic
  ```

  **Another Example (Backend Module)**:
  ```
  DECISION: Creating golden file test suite (Heuristic 3: transpiler detection)
  RATIONALE: Transpiler at 56% coverage, codegen bugs cause runtime failures.
             Compiler construction 2020: 85%+ via output validation.
  ACTION: Creating tests/golden/ with 10 input/output pairs
  STRUCTURE:
    tests/golden/
    ├── 001_simple_function.input
    ├── 001_simple_function.expected.rs
    ├── 002_nested_loops.input
    ├── 002_nested_loops.expected.rs
    └── ... (10+ pairs)
  TEST: Compare transpile(input) == read_file(expected)
  ```

  ## Base Heuristics (Always Apply)

  1. **Uncovered code first** - Prioritize functions with 0% coverage
  2. **Low coverage + Low TDG score** - Target technical debt hotspots
  3. **Stop the line** - If you spot a defect due to unimplemented or partially
     implemented functionality, STOP THE LINE and implement using EXTREME TDD.
     The concept of "pre-existing failure" is irrelevant. Fix it.
  4. **Property tests use 100 cases** - PROPTEST_CASES=100 (not 5) for statistical significance
  5. **Mutation score ≥75%** - Run `cargo mutants` on new modules, ensure high kill rate
  6. **Inline tests per file** - Target 10-15 tests per file (bashrs pattern: 13.5 avg)
  7. **Golden files for codegen** - Compiler/transpiler modules need output validation
  8. **Fuzz parsers** - Use cargo-fuzz/atheris for parser modules (no panic guarantee)

  ## Override Mechanism

  User can override autonomous behavior with explicit instruction:
  - "Continue adding tests" → Ignore heuristics, keep writing tests
  - "Switch to module X" → Target specific module
  - "Commit and stop" → Force commit regardless of state
  - "Skip module Y" → Exclude specific modules from consideration

  If no override given, you MUST make autonomous decision using heuristics above.

  ## Example Autonomous Execution

  ```
  User: pmat prompt code-coverage

  [You analyze state automatically]
  Analyzing coverage state...
  - Current module: parser.rs at 42.24% (Frontend category)
  - Module type: LOGIC (parser)
  - Tests written: 8 unit tests (100% passing)
  - Functions tested: 12/35 (34.28%)
  - Time invested: ~30 minutes

  [You apply Heuristic 3 - Test Type Selection]
  DECISION: Apply property testing + fuzz testing (parser detected)
  RATIONALE: Parser is CRITICAL category (user-facing), 42% coverage below 95% target.
             PLDI 2021: Property testing found 89% of bugs missed by examples.
             Current 8 tests insufficient (need 10-15 per file, bashrs pattern).
  TECHNIQUES:
    - Property tests: PROPTEST_CASES=100 (not 5)
    - Fuzz testing: cargo fuzz run parser_target
    - Error recovery: All malformed inputs return Err, never panic

  ACTION: Creating tests/property_parser.rs with 5 property tests (100 cases each)...

  [Executes property test creation]

  ✅ Created 5 property tests (500 total test cases)
  📊 Coverage: 42.24% → 58.67% (+16.43%, excellent ROI: 3.29%/test)
  🎯 Mutation score: 78% (3 of 14 mutants survived)
  ⏭️  Next action: Add 7 more inline tests to reach 15/file target

  Continue? (y/n)
  ```

  ## Critical Reminders (v3.0 - Compiler-Grade)

  - **NEVER ask the user "What would you like to do?"** - You decide using heuristics
  - **ALWAYS classify module type AND category FIRST** (TYPE determines ROI, CATEGORY determines techniques)
  - **ALWAYS use PROPTEST_CASES=100** (not 5) - statistical significance requires 100+ cases
  - **ALWAYS track ROI** (tests-per-percentage) - auto-pivot if <0.05%/test for 2 batches
  - **ALWAYS explain your decision** before executing (transparency + rationale + evidence)
  - **ALWAYS commit on infrastructure threshold** (20+ tests, 100% passing)
  - **ALWAYS commit on 2%+ coverage improvement** (empirically validated, not 5%)
  - **ALWAYS commit on ROI decline** (<0.05%/test for 2 batches = diminishing returns)
  - **ALWAYS respect time constraints** (make coverage <10min, test-fast <5min, 90-min time-box)
  - **ALWAYS use golden files for codegen** (transpilers, compilers need output validation)
  - **ALWAYS fuzz parsers** (cargo-fuzz, no panic guarantee, PLDI 2021 evidence)
  - **ALWAYS target ≥10-15 tests per file** (bashrs pattern: 13.5 avg, 7,321 inline tests)
  - **ALWAYS check mutation score** (≥75% for production quality, ICSE 2023)

  ## Language-Specific Quick Reference

  ### Rust
  ```bash
  # Coverage (avoid mold linker interference)
  make coverage  # See note below about ~/.cargo/config.toml

  # Property tests (100 cases)
  PROPTEST_CASES=100 cargo test

  # Mutation testing
  cargo mutants --file src/module.rs --timeout 60

  # Fuzz testing
  cargo fuzz run target --jobs 4 -- -max_total_time=300

  # Inline tests
  cargo test --lib module_name::tests
  ```

  **CRITICAL**: Mold linker breaks LLVM coverage! Makefile must temporarily disable:
  ```makefile
  coverage:
    @test -f ~/.cargo/config.toml && mv ~/.cargo/config.toml ~/.cargo/config.toml.cov-backup || true
    @cargo llvm-cov --no-report nextest --all-features --workspace
    @cargo llvm-cov report --html --output-dir target/coverage/html
    @test -f ~/.cargo/config.toml.cov-backup && mv ~/.cargo/config.toml.cov-backup ~/.cargo/config.toml || true
  ```

  ### Python
  ```bash
  # Coverage
  pytest --cov=src --cov-report=html --cov-report=term --cov-fail-under=85

  # Property tests (100 examples)
  pytest --hypothesis-profile=ci  # Set max_examples=100 in conftest.py

  # Mutation testing
  mutmut run --paths-to-mutate src/

  # Inline tests
  pytest tests/test_module.py -v
  ```

  ### TypeScript
  ```bash
  # Coverage
  npm test -- --coverage --coverageThreshold='{"global":{"lines":85}}'

  # Property tests
  npm test -- --testMatch="**/*.prop.test.ts"  # fast-check library

  # Golden file tests
  npm test -- --updateSnapshot  # Jest snapshots
  ```

toyota_way_principles:
  jidoka: stop_the_line
  andon_cord: "true"
  genchi_genbutsu: verify_actual_state
  hansei: deep_reflection_on_roi_decline
  kaizen: continuous_improvement_via_empirical_feedback
  autonomation: "true"
  human_override_available: "true"
empirical_evidence:
  depyler_validation: "67.83% → 71.15%, +136 tests, 100x efficiency (Depyler project, 2025-11-12)"
  roi_improvement: "26x ROI gain via strategic pivot (0.004%/test → 0.105%/test)"
  module_type_discovery: "UI/CLI modules <0.03%/test, LOGIC modules 0.08-0.15%/test"
  bashrs_pattern: "~90% coverage, 7,321 inline tests (13.5 avg/file), 542 files"
  ruchy_strategy: "70.31% → 90%+ target via five-category decomposition"
  proptest_optimal: "100 cases (95% confidence), not 5 (60% confidence)"
  mutation_threshold: "≥75% mutation score for production quality (ICSE 2023)"
research_citations:
  ieee_2023: "35% fewer defects at >85% coverage (p < 0.001)"
  pldi_2021: "Property testing found 89% of bugs missed by examples (GCC, LLVM, ICC)"
  sqlite_2022: "100% MC/DC coverage via 1000:1 test-to-code ratio"
  icse_2023: "Mutation score ≥75% for production code quality"
  cc_2020: "Parser 95%, semantic 90%, codegen 85% coverage targets"