# CLAUDE.md - Ruchy Compiler Implementation Protocol
## IMPORTANT: Auto-Generated Files
**NEVER EDIT `deep_context.md`** - This file is auto-generated and will be overwritten. Any changes should be made to the source files instead.
## Prime Directive
**Generate correct code that compiles on first attempt. Quality is built-in, not bolted-on.**
## 🚨 CRITICAL: E2E Testing Protocol (DEFECT-001 Response)
**SACRED RULE**: NEVER commit frontend changes without E2E tests passing.
**Reference**: `docs/defects/CRITICAL-DEFECT-001-UI-EXECUTION-BROKEN.md`
### Mandatory E2E Testing Checklist
**Before ANY commit touching frontend code** (`static/**/*.html`, `*.js`, `*.css`):
1. ✅ **Run E2E smoke tests**: `./run-e2e-tests.sh tests/e2e/notebook/00-smoke-test.spec.ts`
2. ✅ **Verify selectors exist**: Use `validateSelectors()` helper (prevent phantom UI)
3. ✅ **Check coverage**: Frontend coverage ≥80% (enforced)
4. ✅ **Lint frontend**: `make lint-frontend` passes
5. ✅ **Visual check**: Manually verify in browser (Genchi Genbutsu)
## 🚨 CRITICAL: A+ Code Standard (From paiml-mcp-agent-toolkit)
**ABSOLUTE REQUIREMENT**: All NEW code MUST achieve A+ quality standards:
- **Maximum Cyclomatic Complexity**: ≤10 (not 20, not 15, TEN!)
- **Maximum Cognitive Complexity**: ≤10 (simple, readable, maintainable)
- **Function Size**: ≤30 lines (if longer, decompose it)
- **Single Responsibility**: Each function does ONE thing well
- **Zero SATD**: No TODO, FIXME, HACK, or "temporary" solutions
- **TDD Mandatory**: Write test FIRST, then implementation
- **Test Coverage**: 100% for new functions (no exceptions)
**Enforcement Example**:
```rust
// ❌ BAD: Complexity 15+
fn process_data(items: Vec<Item>) -> Result<Output> {
let mut results = Vec::new();
for item in items {
if item.valid {
if item.type == "A" {
// ... 20 more lines of nested logic
}
}
}
// ... more complexity
}
// ✅ GOOD: Complexity ≤10
fn process_data(items: Vec<Item>) -> Result<Output> {
items.into_iter()
.filter(|item| item.valid)
.map(process_single_item)
.collect()
}
fn process_single_item(item: Item) -> Result<ItemOutput> {
match item.item_type {
ItemType::A => process_type_a(item),
ItemType::B => process_type_b(item),
}
}
```
## EXTREME TDD Protocol (CRITICAL RESPONSE TO ANY BUG)
**ANY BUG IN ANY COMPONENT REQUIRES IMMEDIATE EXTREME TDD RESPONSE:**
### Critical Bug Response (MANDATORY - ZERO EXCEPTIONS):
**🛑 STOP THE LINE PROTOCOL:**
1. **HALT ALL OTHER WORK**: Stop everything when ANY bug found (parser, transpiler, runtime, linter, tooling, etc.)
2. **ROOT CAUSE ANALYSIS**: Use GENCHI GENBUTSU (go and see) to understand exact failure mode
3. **EXTREME TDD FIX** (RED→GREEN→REFACTOR):
- **RED**: Write failing test that reproduces bug FIRST
- **GREEN**: Fix bug with minimal code changes
- **REFACTOR**: Apply PMAT quality gates (A- minimum, ≤10 complexity)
4. **EXTREME TEST COVERAGE**: Create comprehensive test suites immediately:
### 🚨 CRITICAL: Missing Language Feature Protocol (MANDATORY)
**IF YOU DISCOVER A LANGUAGE FEATURE IS "NOT IMPLEMENTED" - IMPLEMENT IT, DON'T SKIP IT!**
**WRONG RESPONSE** (FORBIDDEN):
- ❌ "This feature isn't implemented, let's skip it"
- ❌ "Let's document it as not working"
- ❌ "Let's work around it"
- ❌ "Let's simplify the examples to avoid it"
**CORRECT RESPONSE** (MANDATORY):
1. 🛑 **STOP THE LINE**: Halt current work immediately
2. 🔍 **INVESTIGATE**: Use GENCHI GENBUTSU to verify feature is truly missing (don't assume!)
3. 📋 **CREATE TICKET**: Add to docs/execution/roadmap.md with format: [FEATURE-XXX]
4. ✅ **EXTREME TDD IMPLEMENTATION**:
- **RED**: Write tests for the missing feature FIRST (they will fail)
- **GREEN**: Implement the feature minimally to pass tests
- **REFACTOR**: Apply quality gates (≤10 complexity, A- grade)
5. 📊 **VALIDATE**: Property tests + mutation tests (≥75% coverage)
6. 📝 **DOCUMENT**: Update LANG-COMP with working examples
7. ✅ **COMMIT**: Complete implementation with ticket reference
**Example - Correct Response to Missing Feature**:
```
Discovery: "Negative number literals don't work"
WRONG: "Let's remove negative numbers from examples"
RIGHT:
1. Stop the line
2. Verify: grep -r "Literal.*Negative" src/ (is it truly missing?)
3. Create: [PARSER-042] Implement negative number literals
4. RED: Write test_negative_literals() - fails
5. GREEN: Add parsing for `-123` syntax
6. Property test: All negative integers parse correctly
7. Commit: [PARSER-042] Implement negative number literals with tests
```
**Toyota Way Principle**:
- **Jidoka**: Stop the line when defects found - missing features ARE defects
- **Genchi Genbutsu**: Go see if feature is truly missing (don't assume!)
- **Kaizen**: Each missing feature is an opportunity to improve the language
- **No Shortcuts**: Implement properly with TDD, don't work around
- Unit tests for specific bug scenario
- Integration tests for complete programs
- Property tests with random inputs (10,000+ iterations)
- Mutation tests (≥75% mutation coverage via cargo-mutants)
- Fuzz tests for edge cases
- Doctests in every public function
- `cargo run --examples` MUST pass 100%
5. **REGRESSION PREVENTION**: Add failing test BEFORE fixing bug (TDD mandatory)
6. **PMAT QUALITY GATES**: ALL fixes MUST pass:
- `pmat tdg <file> --min-grade A-` (≥85 points)
- `pmat analyze complexity --max-cyclomatic 10 --max-cognitive 10`
- `pmat analyze satd --fail-on-violation` (zero SATD)
7. **MUTATION VALIDATION**: Run mutation tests on fixed code:
- `cargo mutants --file <fixed-file.rs> --timeout 300`
- Target: ≥75% mutation coverage (CAUGHT/(CAUGHT+MISSED) ≥ 75%)
- If <75%: Add more property tests and re-run
8. **COMPREHENSIVE VALIDATION**: Test all related features after fix
### Test Coverage Requirements (MANDATORY):
- **Parser Tests**: Every token, every grammar rule, every edge case
- **Transpiler Tests**: Every Ruchy construct → Rust construct mapping
- **Runtime Tests**: Every evaluation path, every error condition
- **Linter Tests**: Every lint rule, every scope scenario, every AST pattern
- **Tooling Tests**: Every CLI command, every flag combination, every output format
- **Integration Tests**: Full compile → execute → validate pipeline
- **Property Tests**: Automated generation of valid/invalid programs (10K+ cases)
- **Fuzz Tests**: Random input stress testing (AFL, cargo-fuzz)
- **Mutation Tests**: 75%+ mutation coverage via cargo-mutants (empirical validation)
- **Examples Tests**: All examples/ must compile and run
### Bug Categories (ALL Subject to EXTREME TDD):
- **Parser Bugs**: Grammar issues, tokenization errors, AST construction failures
- **Transpiler Bugs**: Incorrect Rust code generation, type mismatches, codegen errors
- **Runtime Bugs**: Evaluation errors, type system bugs, memory issues
- **Linter Bugs**: False positives, false negatives, scope tracking errors
- **Tooling Bugs**: CLI failures, invalid output, incorrect behavior
- **Quality Bugs**: PMAT violations, complexity explosions, technical debt accumulation
### Mutation Testing Protocol (MANDATORY - Sprint 8)
**CRITICAL**: Mutation testing is the GOLD STANDARD for test quality validation.
#### Why Mutation Testing Matters:
- **Line coverage measures execution, mutation coverage measures effectiveness**
- 99% line coverage can have 20% mutation coverage (tests run code but don't validate it)
- Mutation testing empirically proves tests catch real bugs, not just exercise code paths
- Each mutation simulates a real bug - if tests don't catch it, they're inadequate
#### Incremental Mutation Testing Strategy:
```bash
# NEVER run full baseline (10+ hours) - use incremental file-by-file approach
cargo mutants --file src/frontend/parser/core.rs --timeout 300 # 5-30 min
# Analyze gaps immediately
grep "MISSED" core_mutations.txt
# Write tests targeting SPECIFIC mutations
# Re-run to validate 80%+ coverage achieved
```
#### Test Gap Patterns (From Sprint 8 Empirical Data):
1. **Match Arm Deletions** (most common): Test ALL match arms with assertions
2. **Function Stub Replacements**: Validate return values are real data, not None/empty
3. **Boundary Conditions**: Test <, <=, ==, >, >= explicitly
4. **Boolean Negations**: Test both true AND false branches
5. **Operator Changes**: Test +/-, */%, <=/>=, &&/|| alternatives
#### Mutation-Driven TDD:
1. Run mutation test FIRST to identify gaps
2. Write test targeting SPECIFIC mutation
3. Re-run mutation test to verify fix
4. Repeat until 80%+ coverage achieved
5. No speculative tests - only evidence-based
#### Acceptable Mutations (Rare):
- **Semantically Equivalent**: Mutation produces identical observable behavior
- **Example**: `Vec::leak(Vec::new())` vs `self.state.get_errors()` both return empty slice
- Document why mutation is uncatchable and mark ACCEPTABLE
## Scientific Method Protocol
**WE DON'T GUESS, WE PROVE VIA QUANTITATIVE METHODS AND TESTING.**
### Evidence-Based Development Rules:
1. **NO ASSUMPTIONS**: Every claim must be backed by concrete evidence
2. **MEASURE EVERYTHING**: Use tests, benchmarks, and metrics to validate behavior
3. **REPRODUCE ISSUES**: Create minimal test cases that demonstrate problems
4. **QUANTIFY IMPROVEMENTS**: Before/after metrics prove effectiveness
5. **DOCUMENT EVIDENCE**: All findings must be recorded with reproducible steps
### Investigation Protocol:
1. **Hypothesis**: State what you believe is happening
2. **Test**: Create specific tests that prove/disprove the hypothesis
3. **Measure**: Collect concrete data (test results, timings, coverage)
4. **Analyze**: Draw conclusions only from the evidence
5. **Document**: Record findings and next steps
## QDD (Quality-Driven Development)
**Metrics**: Complexity ≤10, Coverage ≥80%, SATD=0, TDG A- minimum
**Setup**: `pmat quality-gates init; pmat hooks install`
**During**: `pmat tdg dashboard --port 8080 &`
**Commit**: Auto-validated via pre-commit hooks
## Toyota Way Implementation
**STOP THE LINE FOR ANY DEFECT. NO DEFECT IS TOO SMALL. NO SHORTCUT IS ACCEPTABLE.**
### Core Toyota Principles:
1. **Jidoka (Autonomation)**: Build quality into the process, detect problems immediately
2. **Genchi Genbutsu**: Go to the source, understand the root cause through direct observation
3. **Kaizen**: Continuous improvement through systematic problem-solving
4. **Respect for People**: Create systems that prevent human error, not blame people
5. **Long-term Philosophy**: Short-term fixes create long-term problems
### Defect Response Protocol (MANDATORY):
```
1. HALT DEVELOPMENT: Stop all forward progress when defect detected
2. ROOT CAUSE ANALYSIS: Use 5 Whys to find true cause, not symptoms
3. POKA-YOKE: Design error-prevention into the system
4. SYSTEMATIC TESTING: Add comprehensive test coverage to prevent recurrence
5. PROCESS IMPROVEMENT: Update development process to catch similar issues earlier
6. VERIFICATION: Prove the fix works and cannot regress
```
### Testing Hierarchy (Systematic Defect Prevention):
```
Level 1: Unit Tests - Function-level correctness
Level 2: Integration Tests - Component interaction
Level 3: E2E Tests - Full system behavior
Level 4: Property Tests - Mathematical invariants
Level 5: Fuzz Tests - Random input robustness
Level 6: Regression Tests - Historical defect prevention
Level 7: Performance Tests - Non-functional requirements
```
**NEVER AGAIN RULE**: Every defect must be made impossible to repeat through systematic prevention.
### 🚨 SACRED RULE: NO DEFECT IS OUT OF SCOPE
**ABSOLUTE PRINCIPLE**: EVERY defect, bug, missing feature, or incompleteness MUST be fixed immediately. ZERO exceptions.
**FORBIDDEN RESPONSES**:
- ❌ "This is out of scope for the current task"
- ❌ "Let's defer this to a future sprint"
- ❌ "This is a separate issue"
- ❌ "Let's work around this for now"
- ❌ "This is a known limitation"
**MANDATORY RESPONSE**:
1. 🛑 **STOP THE LINE**: Halt ALL other work immediately
2. 🔍 **ROOT CAUSE ANALYSIS**: Use Five Whys to understand the defect
3. 📋 **CREATE SPECIFICATION**: Document what needs to be fixed
4. ✅ **EXTREME TDD**: RED→GREEN→REFACTOR to fix the defect
5. 🧪 **COMPREHENSIVE TESTING**: Property tests, mutation tests, fuzz tests
6. ✅ **VERIFY FIX**: Prove the defect is resolved and cannot regress
7. 📝 **DOCUMENT**: Update all affected documentation and tests
**Toyota Way Principle**:
- **Jidoka**: Stop the line when ANY problem is found
- **No defect is too small**: Every defect represents a gap in quality
- **No shortcut is acceptable**: Fix the root cause, not the symptom
- **Long-term philosophy**: Deferring defects creates technical debt that compounds
**Example - Correct Response**:
```
Discovery: "F-string compilation to WASM doesn't work"
WRONG: "F-strings are out of scope for WASM work, let's skip them"
RIGHT:
1. STOP THE LINE - this is a defect, not a feature request
2. Five Whys: Why don't f-strings work in WASM? → Not implemented
3. Create: docs/specifications/wasm-fstring-spec.md
4. RED: test_fstring_compiles_to_wasm() - FAILS
5. GREEN: Implement f-string WASM lowering
6. Property test: All f-strings compile to valid WASM
7. Mutation test: Verify tests catch f-string bugs
8. Commit: [WASM-XXX] Implement f-string WASM compilation with tests
```
**Why This Matters**:
- **Quality is non-negotiable**: Every defect deferred is technical debt
- **User trust**: Incomplete features break user confidence
- **Compounding problems**: Small defects become big problems
- **Toyota Way**: Stop the line for ANY defect, no exceptions
### Mandatory Testing Requirements (80% Property Test Coverage)
**CRITICAL**: Following paiml-mcp-agent-toolkit Sprint 88 success pattern:
1. **Property Test Coverage**: Target 80% of all modules with property tests
2. **Doctests**: Every public function MUST have runnable documentation examples
3. **Property Tests**: Use proptest to verify invariants with 10,000+ random inputs
4. **Fuzz Tests**: Use cargo-fuzz or AFL to find edge cases with millions of inputs
5. **Examples**: Create working examples in examples/ directory demonstrating correct usage
6. **Integration Tests**: End-to-end scenarios covering real-world usage patterns
7. **Regression Tests**: Specific test case that reproduces and prevents the exact defect
**Property Test Injection Pattern** (from pmat):
```rust
#[cfg(test)]
mod property_tests {
use super::*;
use proptest::prelude::*;
proptest! {
#[test]
fn test_function_never_panics(input: String) {
let _ = function_under_test(&input); // Should not panic
}
#[test]
fn test_invariant_holds(a: i32, b: i32) {
let result = add(a, b);
prop_assert_eq!(result, b + a); // Commutative property
}
}
}
```
**Code Coverage Requirements** (QUALITY-008 Implemented):
- **Current Baseline**: 33.34% overall (post-QUALITY-007 parser enhancements)
- **Regression Prevention**: Pre-commit hooks BLOCK commits below baseline
- **Direction**: Coverage must increase or stay same, NEVER decrease
- **Parser Improvements**: Character literals, tuple destructuring, rest patterns now working
- **Pattern Test Results**: 2 passing → 4 passing (100% improvement achieved)
- **Enforcement**: Automated coverage checking with clear error messages
## PMAT Quality Gates (v2.70+)
**Setup**: `pmat quality-gates init; pmat hooks install`
**Gates**: TDG A-, Complexity ≤10, SATD=0, Coverage ≥80%, Build clean
**Health**: `pmat maintain health` (~10s)
## PMAT TDG Quality Enforcement
**Standards**: A- (≥85), Complexity ≤10, SATD=0, Duplication <10%, Docs >70%
**Before**: `pmat tdg . --min-grade A- --fail-on-violation`
**During**: `pmat tdg <file> --include-components`
**Commit**: Blocked by pre-commit hooks if <A-
## Toyota Way Success Stories
### Property Testing Victory (2024-12)
- **545 systematic test cases**: 0 parser inconsistencies found
- **ROOT CAUSE**: Manual testing methodology error, NOT code defect
- **LESSON**: Property testing is objective - mathematically proves system behavior
### PMAT Enforcement Success (2025-08-30)
**Discovery**: 3,557 quality violations found
- **Finding**: Functions with 72x complexity limit (720 vs 10)
- **SATD debt**: 1,280 technical debt comments
- **Dead code**: 6 violations indicating maintenance debt
- **Root cause**: PMAT quality gates not enforced during development
- **Solution**: Mandatory PMAT enforcement at every development step
- **Lesson**: Quality must be built-in from start, not bolted-on later
### Language Completeness Achievement v1.9.1 (2025-08)
**Systematic Testing Revealed**:
- ✅ **Fat arrow syntax**: Functional (`x => x + 1`)
- ✅ **String interpolation**: Functional (`f"Hello {name}"`)
- ✅ **Async/await**: Functional (async fn and await expressions)
- ✅ **DataFrame literals**: Functional (`df![]` macro)
- ✅ **Generics**: Functional (`Vec<T>`, `Option<T>`)
- ✅ **Pipeline Operator**: `|>` for functional programming (v1.9.0)
- ✅ **Import/Export**: Module system evaluation (v1.9.1)
- ✅ **String Methods**: Complete suite (v1.8.9)
**ROOT CAUSE**: Manual testing created false negatives. Features were already implemented.
### Quality Excellence Sprint v1.6.0
**Results**: 107 tests created, 287 tests passing, 80% coverage achieved
- LSP module coverage: 0% → 96-100%
- MCP module coverage: 0% → 33%
- Type inference coverage: 0% → 15%
### Complete Language Restoration - Status
**Core Functionality Status**:
- Basic Math, Float Math, Variables ✅
- String Concatenation, Method Calls ✅
- Boolean Operations, Complex Expressions ✅
- Reserved Keywords: final → r#final (automatic) ✅
## Scripting Policy
**CRITICAL**: Use ONLY Ruchy scripts for adhoc scripting and testing. No Python, Bash scripts, or other languages for testing Ruchy functionality.
✅ **Allowed**: `*.ruchy` files loaded via `:load` command in REPL
❌ **Forbidden**: Python scripts, shell scripts, or any non-Ruchy testing code
## Implementation Hierarchy
```yaml
Navigation:
1. SPECIFICATION.md # What to build (reference)
2. docs/execution/roadmap.md # Strategic priorities and current tasks
3. docs/execution/ # Tactical work breakdown
4. ../ruchy-book/INTEGRATION.md # Book compatibility tracking
5. CHANGELOG.md # Version history and release notes
```
## Book Compatibility Monitoring
**CRITICAL**: Check `../ruchy-book/INTEGRATION.md` FREQUENTLY for:
- Current compatibility: 19% (49/259 examples) + 100% one-liners (20/20)
- v1.9.1 Language Completeness: Pipeline operator, Import/Export, String methods
- Regression detection from previous versions
## Quality Status (v1.9.3)
**INTERPRETER COMPLEXITY**:
- evaluate_expr: 138 (was 209, target <50)
- Value::fmt: 66 (target <30)
- Value::format_dataframe: 69 (target <30)
- **Latest Features**: Math functions (sqrt, pow, abs, min, max, floor, ceil, round)
## Critical Quality Gate Defect (Toyota Way Investigation)
**DEFECT**: Pre-commit hook hangs at dogfooding test
**ROOT CAUSE**: Transpiler generates different code in debug vs release mode, violating determinism
**PREVENTION**:
- Add property test: `assert!(transpile_debug(x) == transpile_release(x))`
- Quality gates must use consistent binary paths
- Never allow behavioral differences between debug/release
## Absolute Rules (From paiml-mcp-agent-toolkit)
1. **NEVER Leave Stub Implementations**: Every feature must be fully functional. No "TODO" or "not yet implemented".
2. **NEVER Add SATD Comments**: Zero tolerance for TODO, FIXME, HACK comments. Complete implementation only.
3. **NEVER Use Simple Heuristics**: Always use proper AST-based analysis and accurate algorithms.
4. **NEVER Duplicate Core Logic**: ONE implementation per feature. All consumers use same underlying logic.
5. **ALWAYS Dogfood via MCP First**: Use our own MCP tools as primary interface when available.
6. **NEVER Bypass Quality Gates**: Zero tolerance for `--no-verify`. Fix issues, don't bypass.
7. **NEVER Use Git Branches**: Work directly on main branch. Continuous integration prevents conflicts.
8. **ALWAYS Apply Kaizen**: Small, incremental improvements. One file at a time.
9. **ALWAYS Use Genchi Genbutsu**: Don't guess. Use PMAT to find actual root causes.
10. **ALWAYS Apply Jidoka**: Automate with human verification. Use `pmat refactor auto`.
## Task Execution Protocol
### MANDATORY: Roadmap and Ticket Tracking
**CRITICAL**: ALL development work MUST follow roadmap-driven development:
1. **ALWAYS Use Ticket Numbers**: Every commit, PR, and task MUST reference a ticket ID from docs/execution/roadmap.md
2. **Roadmap-First Development**: No work begins without a corresponding roadmap entry
3. **Ticket Format**: Use format "QUALITY-XXX", "PARSER-XXX", "DF-XXX", "WASM-XXX" per roadmap
4. **Traceability**: Every change must be traceable back to business requirements via ticket system
5. **Sprint Planning**: Work is organized by sprint with clear task dependencies and priorities
### Pre-Implementation Verification (PMAT-Enforced)
```rust
// HALT. Before implementing ANY feature:
□ Run PMAT baseline: `pmat quality-gate --fail-on-violation --checks=all`
□ Check ../ruchy-book/INTEGRATION.md for latest compatibility report
□ Check ../ruchy-book/docs/bugs/ruchy-runtime-bugs.md for known issues
□ Locate specification section in SPECIFICATION.md
□ Find task ID in docs/execution/roadmap.md (MANDATORY)
□ Verify ticket dependencies completed via DAG
□ Reference ticket number in all commits/PRs
□ Check existing patterns in codebase (GENCHI GENBUTSU - Go And See!)
- For CLI tests: Use assert_cmd pattern from tests/fifteen_tool_validation.rs
- For property tests: Use proptest pattern from existing property_tests modules
- For mutation tests: Follow cargo-mutants pattern from Sprint 8
□ PMAT complexity check: `pmat analyze complexity --max-cyclomatic 10`
□ Confirm complexity budget (<10 cognitive) via PMAT verification
□ Zero SATD: `pmat analyze satd --fail-on-violation`
```
### MANDATORY: assert_cmd for ALL CLI Testing
**CRITICAL**: ALL tests that invoke CLI commands MUST use assert_cmd, NOT raw std::process::Command.
**Why assert_cmd is Mandatory**:
- **Deterministic**: Predicates provide clear, testable assertions
- **Maintainable**: Standard pattern across all CLI tests
- **Debuggable**: Better error messages than raw Command
- **Proven**: Already used in fifteen_tool_validation.rs with 18/22 passing tests
**Pattern (from fifteen_tool_validation.rs)**:
```rust
use assert_cmd::Command;
use predicates::prelude::*;
fn ruchy_cmd() -> Command {
Command::cargo_bin("ruchy").expect("Failed to find ruchy binary")
}
#[test]
fn test_example() {
ruchy_cmd()
.arg("run")
.arg("examples/test.ruchy")
.assert()
.success()
.stdout(predicate::str::contains("expected output"));
}
```
**Never Use**: `std::process::Command` for testing CLI - this is a quality defect.
### MANDATORY: Test Naming Convention (TRACEABILITY)
**CRITICAL**: Every test MUST be traceable to its documentation/specification via naming convention.
**Naming Pattern** (MANDATORY - NO EXCEPTIONS):
```
test_<TICKET_ID>_<section>_<feature>_<scenario>
Examples:
- test_langcomp_003_01_if_expression_true_branch()
- test_langcomp_003_01_if_expression_example_file()
- test_langcomp_003_02_match_literal_pattern()
- test_langcomp_003_05_break_exits_loop()
```
**Component Breakdown**:
1. `TICKET_ID`: LANG-COMP-003 → `langcomp_003` (lowercase, underscored)
2. `section`: File number (01, 02, 03, 04, 05)
3. `feature`: What is being tested (if_expression, match_pattern, for_loop)
4. `scenario`: Specific test case (true_branch, example_file, literal_pattern)
**Why Naming Convention is Mandatory**:
- **Traceability**: Instantly map test → documentation → ticket → requirement
- **Findability**: `grep "langcomp_003_01"` finds all if-expression tests
- **Validation**: Prove documentation examples are tested (not just written)
- **Toyota Way**: Standard work enables quality - no standard = no quality
**Test-to-Doc Linkage** (MANDATORY):
```rust
// File: tests/lang_comp/control_flow.rs
// Links to: examples/lang_comp/03-control-flow/01_if.ruchy
// Validates: LANG-COMP-003 Control Flow - If Expressions
#[test]
fn test_langcomp_003_01_if_expression_example_file() {
ruchy_cmd()
.arg("run")
.arg("examples/lang_comp/03-control-flow/01_if.ruchy")
.assert()
.success();
}
```
**Generic Names are DEFECTS**: `test_if_true_branch()` provides ZERO traceability.
### Commit Message Format
```
[TICKET-ID] Brief description
- Specific changes
- Test coverage
- TDG Score: src/file.rs: 68.2→82.5 (C+→B+)
Closes: TICKET-ID
```
### End-of-Sprint Git Commit Protocol (MANDATORY)
**CRITICAL**: After EVERY sprint completion, you MUST commit all changes immediately.
**Sprint Completion Checklist (ALL MANDATORY - ZERO EXCEPTIONS)**:
1. ✅ All sprint tasks complete and verified
2. ✅ **Unit tests passing**: All basic functionality tests green
3. ✅ **Property tests EXECUTED**: Run ignored property tests, verify they pass, report results
```bash
cargo test <test_module>::property_tests -- --ignored --nocapture
```
4. ✅ **Mutation tests EXECUTED**: Run mutation testing on new code, achieve ≥75% coverage
```bash
cargo mutants --file tests/lang_comp/<module>.rs --timeout 300
```
5. ✅ **15-Tool Validation**: Verify examples work with ALL 15 native tools (sample validation acceptable)
6. ✅ **Roadmap updated** with sprint completion status INCLUDING test metrics
7. ✅ **Documentation updated** (examples, tests, validation results)
8. ✅ **GIT COMMIT IMMEDIATELY** - Don't wait, commit now!
**Why Property & Mutation Testing is MANDATORY**:
- **Property Tests**: Prove invariants hold across 10K+ random inputs (mathematical correctness)
- **Mutation Tests**: Empirically prove tests catch real bugs, not just exercise code paths
- **Without Both**: You have NO PROOF tests are effective - just coverage theater
**Commit Protocol**:
```bash
# After sprint completion, ALWAYS run:
git add .
git status # Verify changes
git commit -m "[SPRINT-ID] Sprint completion: <brief summary>
- All tasks complete: <list ticket IDs>
- Tests: X/X passing
- Examples: X files created
- Validation: X-tool protocol verified
- Roadmap: Updated with completion status
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>"
git status # Verify commit success
```
**Why This Matters (Toyota Way)**:
- **Jidoka**: Each sprint is a complete unit of work - commit it atomically
- **Genchi Genbutsu**: Working code on disk = empirical evidence of progress
- **Kaizen**: Small, verified increments prevent integration hell
- **Risk Mitigation**: Never lose completed work due to session interruption
## MANDATORY: TDG Transactional Tracking
**Scoring**: A+ (95-100), A (90-94), A- (85-89), B (80-84), C (70-79), D (60-69), F (<60)
**Pre-Commit**: `pmat tdg . --min-grade A- --fail-on-violation` (BLOCKING)
## Compiler Architecture Patterns
### Parser Pattern - Pratt with Error Recovery
```rust
impl Parser {
fn parse_expr(&mut self, min_bp: u8) -> Result<Expr, ParseError> {
let mut lhs = self.parse_prefix()?;
while let Some(&op) = self.peek() {
let (l_bp, r_bp) = op.binding_power();
if l_bp < min_bp { break; }
self.advance();
let rhs = self.parse_expr(r_bp)?;
lhs = Expr::binary(op, lhs, rhs, self.span());
}
Ok(lhs)
}
}
```
### Type Inference - Bidirectional Checking
```rust
impl TypeChecker {
fn check(&mut self, expr: &Expr, expected: Type) -> Result<(), TypeError> {
match (&expr.kind, expected) {
(ExprKind::Lambda(params, body), Type::Function(arg_tys, ret_ty)) => {
self.check_params(params, arg_tys)?;
self.check(body, *ret_ty)
}
_ => {
let inferred = self.infer(expr)?;
self.unify(inferred, expected)
}
}
}
}
```
## MANDATORY Quality Gates (BLOCKING - Not Advisory)
### SACRED RULE: NEVER BYPASS QUALITY GATES
**ABSOLUTELY FORBIDDEN**:
- `git commit --no-verify` - NEVER use this - NO EXCEPTIONS EVER
- Skipping tests "temporarily" - NO exceptions
- Ignoring failing quality checks - Must fix EVERY defect
- Dismissing warnings as "unrelated" - All defects matter
**Toyota Way Principle**: Stop the line for ANY defect. No defect is too small. No shortcut is acceptable.
**WHEN CLIPPY BLOCKS**: Always fix the root cause:
- Missing `# Errors` sections → Add proper documentation with examples
- Using `unwrap()` → Replace with `expect()` with meaningful messages
- Dead code warnings → Remove or prefix with underscore
- Missing doctests → Add runnable examples to documentation
### MANDATORY RELEASE PUBLISHING PROTOCOL
**CRITICAL**: After EVERY version update, you MUST publish to crates.io immediately.
```bash
# MANDATORY after version bump and git push:
cargo publish # Publish main package only
```
### Pre-commit Hooks (AUTO-INSTALLED via `pmat hooks install`)
Gates: TDG A-, Function complexity ≤10, Basic REPL test
Install: `pmat hooks install`
## The Make Lint Contract (Zero Warnings Allowed)
```bash
# make lint command from Makefile:
cargo clippy --all-targets --all-features -- -D warnings
```
**Critical**: The `-D warnings` flag treats EVERY clippy warning as a hard error.
## Language Feature Testing Protocol
### CRITICAL REQUIREMENT: Language Compatibility First
**NO CODE CHANGES can be committed without passing language feature compatibility tests.**
#### Compatibility Test Suite (MANDATORY)
```bash
# Run before EVERY commit - no exceptions
make compatibility # Or: cargo test compatibility_report --test compatibility_suite -- --nocapture --ignored
```
**Current Standards (v1.0.0)**:
- ✅ **One-liners**: 100% (15/15) - Baseline
- ✅ **Basic Language Features**: 100% (5/5) - Core syntax complete
- ✅ **Control Flow**: 100% (5/5) - if/match/for/while/pattern-guards
- ✅ **Data Structures**: 100% (7/7) - Objects functional
- ✅ **String Operations**: 100% (5/5) - String methods working
- ✅ **Numeric Operations**: 100% (4/4) - Integer.to_string() + math ops
- ✅ **Advanced Features**: 100% (4/4) - Pattern guards complete
**Total: 41/41 features working**
### Test Organization (Industry Standard)
```
tests/
├── compatibility_suite.rs # Main feature compatibility (100% required)
├── properties/ # Property-based testing (Haskell style)
├── regression/ # Bug prevention (every GitHub issue)
└── benchmarks/ # Performance baselines (SQLite style)
```
Language compatibility testing is **GATE 2** in our mandatory pre-commit hooks - more critical than complexity or linting because **language regressions break user code**.
## 15 Native Tool Validation Protocol (LANG-COMP MANDATORY)
**CRITICAL**: All language completeness documentation (LANG-COMP tickets) MUST validate examples using ALL 15 native Ruchy tools.
**🚨 SACRED PRINCIPLE: LANG-COMP TESTS ARE DEFECT-FINDING TOOLS**
**Purpose**: LANG-COMP tests are DESIGNED to find subtle compiler/interpreter defects
- **NOT documentation**: These tests expose gaps in implementation
- **NO WORKAROUNDS EVER**: If a LANG-COMP test fails → FIX THE COMPILER
- **Stop The Line**: Every failure indicates a real defect (Toyota Way: Jidoka)
- **Success Stories**:
- ✅ DEFECT-POW: Found pow() missing in eval mode → FIXED in eval_integer_method
- ✅ DEFECT-REF: Found reference operator (&) missing → FIXED in eval_operations
- ✅ DEFECT-TUPLE: Found tuple field access missing in eval → FIXED in eval_field_access
**Sacred Rule**: LANG-COMP test failing = Compiler bug. Fix compiler, NEVER skip tests.
**IMPORTANT**: Following TOOL-VALIDATION sprint completion (2025-10-07), ALL 15 tools now support validation via CLI. NO tools should be skipped:
- **REPL**: Use `ruchy -e "$(cat file.ruchy)"` to execute code via eval flag (discovered 2025-10-07)
- **Notebook**: Accepts file parameter for non-interactive validation mode
- **WASM**: Some features have limitations, validate tool works with simple code
### 15-Tool Validation Requirements (MANDATORY/BLOCKING)
**EACH LANG-COMP TEST MUST BE NAMED BY TICKET AND INVOKE ALL 15 TOOLS - ZERO EXCEPTIONS**
#### Mandatory Test Pattern (ALL 15 TOOLS - NO SKIPS):
```rust
#[test]
fn test_langcomp_XXX_YY_feature_name() {
let example = example_path("XX-feature/YY_example.ruchy");
// TOOL 1: ruchy check
ruchy_cmd().arg("check").arg(&example).assert().success();
// TOOL 2: ruchy transpile
ruchy_cmd().arg("transpile").arg(&example).assert().success();
// TOOL 3: ruchy -e (execute code via eval - REPL functionality)
let code = std::fs::read_to_string(&example).unwrap();
ruchy_cmd().arg("-e").arg(&code).assert().success();
// TOOL 4: ruchy lint
ruchy_cmd().arg("lint").arg(&example).assert().success();
// TOOL 5: ruchy compile
ruchy_cmd().arg("compile").arg(&example).assert().success();
// TOOL 6: ruchy run
ruchy_cmd().arg("run").arg(&example).assert().success();
// TOOL 7: ruchy coverage
ruchy_cmd().arg("coverage").arg(&example).assert().success();
// TOOL 8: ruchy runtime --bigo
ruchy_cmd().arg("runtime").arg(&example).arg("--bigo").assert().success();
// TOOL 9: ruchy ast
ruchy_cmd().arg("ast").arg(&example).assert().success();
// TOOL 10: ruchy wasm
ruchy_cmd().arg("wasm").arg(&example).assert().success();
// TOOL 11: ruchy provability
ruchy_cmd().arg("provability").arg(&example).assert().success();
// TOOL 12: ruchy property-tests
ruchy_cmd().arg("property-tests").arg(&example).assert().success();
// TOOL 13: ruchy mutations
ruchy_cmd().arg("mutations").arg(&example).assert().success();
// TOOL 14: ruchy fuzz
ruchy_cmd().arg("fuzz").arg(&example).assert().success();
// TOOL 15: ruchy notebook (file validation mode)
ruchy_cmd().arg("notebook").arg(&example).assert().success();
}
```
**ACCEPTANCE CRITERIA**: Test passes ONLY if ALL 15 tools succeed on the example file.
**NAMING**: `test_langcomp_XXX_YY_feature_name` where XXX = ticket number, YY = section
**MAKEFILE TARGET**: Run all LANG-COMP 15-tool validation tests with `make test-lang-comp`
- Executes comprehensive 15-tool validation via `cargo test --test lang_comp_suite`
- Tests LANG-COMP-006 (Data Structures), 007 (Type Annotations), 008 (Methods), 009 (Pattern Matching)
- Each test validates ALL 15 tools: check, transpile, eval (-e flag), lint, compile, run, coverage, runtime, ast, wasm, provability, property-tests, mutations, fuzz, notebook
- Exits with error if any test fails (CI-friendly)
- Current implementation: 4 test modules with full 15-tool coverage
See: docs/SPECIFICATION.md Section 31
## The Development Flow (PMAT-Enforced)
### MANDATORY: PMAT Quality at Every Step
```
1. BASELINE CHECK: Run `pmat quality-gate --fail-on-violation --checks=all`
2. LOCATE specification section in SPECIFICATION.md
3. IDENTIFY task in execution roadmap with ticket number
4. VERIFY dependencies complete via roadmap DAG
5. IMPLEMENT with <10 complexity (verified by `pmat analyze complexity`)
6. VALIDATE: Run `pmat quality-gate` before ANY commit
7. COMMIT with task reference (only if PMAT passes)
```
### TDG Violation Response (IMMEDIATE):
1. **HALT**: Stop when TDG < A- (85 points)
2. **ANALYZE**: `pmat tdg <file> --include-components`
3. **REFACTOR**: Fix specific component issues
4. **VERIFY**: Re-run to prove A- achievement
## Sprint Hygiene Protocol
### Pre-Sprint Cleanup (MANDATORY)
```bash
# Remove all debug binaries before starting sprint
rm -f test_* debug_*
find . -type f -executable -not -path "./target/*" -not -path "./.git/*" -delete
# Verify no large files
find . -type f -size +100M -not -path "./target/*" -not -path "./.git/*"
```
---
**Remember**: Compiler engineering is about systematic transformation, not clever hacks. Every abstraction must have zero runtime cost. Every error must be actionable. Every line of code must justify its complexity budget.
## PMAT v2.68.0+ Advanced Features
### Daily Workflow
**Before Work**: `pmat tdg . --top-files 10; pmat tdg dashboard --port 8080 --open &`
**During**: Monitor dashboard, check files: `pmat tdg <file> --include-components`
**Before Commit**: `pmat tdg . --min-grade A- --fail-on-violation`
### Key Rules
- **PMAT FIRST**: Run quality gates before ANY task
- **NO BYPASS**: Never `--no-verify`, fix root cause via Five Whys
- **TDD MANDATORY**: Write test first, prove fix works
- Use cargo-llvm-cov (not tarpaulin)
- All bugs solved with TDD, never manual hacks
- Mix: unit/doctests/property-tests/fuzz tests
- Check ../ruchy-book and ../rosetta-ruchy at sprint start
## Documentation Standards
**Professional Documentation Requirements**:
- Use precise, factual language without hyperbole or marketing speak
- Avoid excessive exclamation marks and celebratory language
- State achievements and features objectively
- Focus on technical accuracy over promotional language
- Never create documentation files proactively unless explicitly requested
- Documentation should be maintainable and verifiable