debtmap 0.17.0 - Docs.rs

# Complexity Metrics

Debtmap measures complexity using multiple complementary approaches. Each metric captures a different aspect of code difficulty.

## Cyclomatic Complexity

Measures the number of linearly independent paths through code - essentially counting decision points.

**How it works:**
- Start with a base complexity of 1
- Add 1 for each: `if`, `else if`, `match` arm, `while`, `for`, `&&`, `||`, `?` operator
- Does NOT increase for `else` (it's the alternate path, not a new decision)

**Thresholds:**
- **1-5**: Simple, easy to test - typically needs 1-3 test cases
- **6-10**: Moderate complexity - needs 4-8 test cases
- **11-20**: Complex, consider refactoring - needs 9+ test cases
- **20+**: Very complex, high risk - difficult to test thoroughly

**Example:**
```rust
fn validate_user(age: u32, has_license: bool, country: &str) -> bool {
    // Complexity: 4
    // Base (1) + if (1) + && (1) + match (1) = 4
    if age >= 18 && has_license {
        match country {
            "US" | "CA" => true,
            _ => false,
        }
    } else {
        false
    }
}
```

## Cognitive Complexity

Measures how difficult code is to understand by considering nesting depth and control flow interruptions.

**How it differs from cyclomatic:**
- Nesting increases weight (deeply nested code is harder to understand)
- Linear sequences don't increase complexity (easier to follow)
- Breaks and continues add complexity (interrupt normal flow)

**Calculation:**
- Each structure (if, loop, match) gets a base score
- **Nesting increases weight linearly**: Each nesting level adds to the complexity score
  - Base level (no nesting): weight = 1
  - First nesting level: weight = 2
  - Second nesting level: weight = 3
  - Formula: `complexity = 1 + nesting_level` (from src/complexity/cognitive.rs:167)
- Break/continue/return in middle of function adds cognitive load

**Example:**
```rust
// Cyclomatic: 5, Cognitive: 8
fn process_items(items: Vec<Item>) -> Vec<Result> {
    let mut results = vec![];

    for item in items {                    // +1 cognitive
        if item.is_valid() {               // +2 (nested in loop)
            match item.type {              // +3 (nested 2 levels)
                Type::A => results.push(process_a(item)),
                Type::B => {
                    if item.priority > 5 { // +4 (nested 3 levels)
                        results.push(process_b_priority(item));
                    }
                }
                _ => continue,             // +1 (control flow interruption)
            }
        }
    }

    results
}
```

**Thresholds:**
- **0-5**: Trivial - anyone can understand
- **6-10**: Simple - straightforward logic
- **11-20**: Moderate - requires careful reading
- **21-40**: Complex - difficult to understand
- **40+**: Very complex - needs refactoring

## Entropy-Based Complexity Analysis

Uses information theory to distinguish genuinely complex code from pattern-based repetitive code. This dramatically reduces false positives for validation functions, dispatchers, and configuration parsers.

**How it works:**
1. **Token Entropy** (0.0-1.0): Measures variety in code tokens
   - High entropy (0.7+): Diverse logic, genuinely complex
   - Low entropy (0.0-0.4): Repetitive patterns, less complex than it appears

2. **Pattern Repetition** (0.0-1.0): Detects repetitive structures in AST
   - High repetition (0.7+): Similar blocks repeated (validation checks, case handlers)
   - Low repetition: Unique logic throughout

3. **Branch Similarity** (0.0-1.0): Analyzes similarity between conditional branches
   - High similarity (0.8+): Branches do similar things (consistent handling)
   - Low similarity: Each branch has unique logic

4. **Token Classification**: Categorizes tokens by type with weighted importance (src/complexity/entropy_core.rs:267-277)
   - **Token categories and weights** (from src/complexity/entropy_traits.rs:24-44):
     - `ControlFlow` (1.2): if, match, for, while - highest weight for control structures
     - `Keyword` (1.0): language keywords like fn, let, pub
     - `FunctionCall` (0.9): method calls and API usage
     - `Operator` (0.8): +, -, *, ==, etc.
     - `Identifier` (0.5): variable and function names
     - `Literal` (0.3): string, number, boolean literals - lowest weight
   - Higher weights emphasize structural complexity over superficial differences
   - Focuses entropy calculation on control flow and logic rather than data values

**Dampening logic:** Dampening is applied when multiple factors indicate repetitive patterns:
- Low token entropy (< 0.4) indicates simple, repetitive patterns
- High pattern repetition (> 0.6) shows similar code blocks (measured via PatternMetrics)
- High branch similarity (> 0.7) indicates consistent branching logic

**Pattern detection** (src/complexity/entropy_core.rs:279-308):
- `PatternMetrics` tracks intermediate calculations:
  - `total_patterns`: Total number of code patterns detected
  - `unique_patterns`: Count of distinct patterns
  - `repetition_ratio`: Calculated as `1.0 - (unique_patterns / total_patterns)`
- High repetition ratio indicates validation functions, dispatchers, and configuration parsers

When these conditions are met:
```
effective_complexity = entropy × pattern_factor × similarity_factor
```

**Note on metrics** (src/complexity/entropy_core.rs:228-265):
- `token_entropy`: Measures unpredictability of code tokens (0.0-1.0), used for pattern detection
- `effective_complexity`: Final composite score after applying dampening adjustments
- These are distinct metrics - `effective_complexity` combines multiple factors, while `token_entropy` is a single entropy measurement

**Dampening cap:** The dampening factor is clamped between 0.5 and 1.0 (src/complexity/entropy_core.rs:114), allowing a maximum of 50% complexity reduction. The configuration option `max_combined_reduction` (default 0.30) provides additional control over the maximum allowed reduction. This prevents over-correction of pattern-based code while still providing meaningful adjustments for repetitive structures.

**Example:**
```rust
// Without entropy: Cyclomatic = 15 (appears very complex)
// With entropy: Effective = 8 (pattern-based, dampened ~47%)
fn validate_config(config: &Config) -> Result<(), ValidationError> {
    if config.name.is_empty() { return Err(ValidationError::EmptyName); }
    if config.port == 0 { return Err(ValidationError::InvalidPort); }
    if config.host.is_empty() { return Err(ValidationError::EmptyHost); }
    if config.timeout == 0 { return Err(ValidationError::InvalidTimeout); }
    // ... 11 more similar checks
    Ok(())
}
```

**Enable in `.debtmap.toml`:**
```toml
[entropy]
enabled = true                 # Enable entropy analysis (default: true)
weight = 0.5                  # Weight in adjustment (0.0-1.0)
use_classification = true     # Advanced token classification
pattern_threshold = 0.7       # Pattern detection threshold
entropy_threshold = 0.4       # Entropy below this triggers dampening
branch_threshold = 0.8        # Branch similarity threshold
max_combined_reduction = 0.3  # Maximum 30% reduction
```

**Output fields in EntropyScore:**
- `unique_variables`: Count of distinct variables in the function (measures variable diversity)
- `max_nesting`: Maximum nesting depth detected (contributes to dampening calculation)
- `dampening_applied`: Actual dampening factor applied to the complexity score

## Nesting Depth

Maximum level of indentation in a function. Deep nesting makes code hard to follow.

**Thresholds:**
- **1-2**: Flat, easy to read
- **3-4**: Moderate nesting
- **5+**: Deep nesting, consider extracting functions

**Example:**
```rust
// Nesting depth: 4 (difficult to follow)
fn process(data: Data) -> Result<Output> {
    if data.is_valid() {                    // Level 1
        for item in data.items {            // Level 2
            if item.active {                // Level 3
                match item.type {           // Level 4
                    Type::A => { /* ... */ }
                    Type::B => { /* ... */ }
                }
            }
        }
    }
}
```

**Refactored:**
```rust
// Nesting depth: 2 (much clearer)
fn process(data: Data) -> Result<Output> {
    if !data.is_valid() {
        return Err(Error::Invalid);
    }

    data.items
        .iter()
        .filter(|item| item.active)
        .map(|item| process_item(item))     // Extract to separate function
        .collect()
}
```

## Function Length

Number of lines in a function. Long functions often violate single responsibility principle.

**Thresholds:**
- **1-20 lines**: Good - focused, single purpose
- **21-50 lines**: Acceptable - may have multiple steps
- **51-100 lines**: Long - consider breaking up
- **100+ lines**: Very long - definitely needs refactoring

**Why length matters:**
- Harder to understand and remember
- Harder to test thoroughly
- Often violates single responsibility
- Difficult to reuse

## Constructor Detection

Debtmap identifies constructor functions using AST-based analysis (Spec 122), which goes beyond simple name-based detection to catch non-standard constructor patterns.

**Detection Strategy:**

1. **Return Type Analysis**: Functions returning `Self`, `Result<Self>`, or `Option<Self>`
2. **Body Pattern Analysis**: Struct initialization or simple field assignments
3. **Complexity Check**: Low cyclomatic complexity (≤5), no loops, minimal branching

**Why AST-based detection?**

Name-based detection (looking for `new`, `new_*`, `from_*`) misses non-standard constructors:

```rust
// Caught by name-based detection
fn new() -> Self {
    Self { timeout: 30 }
}

// Missed by name-based, caught by AST detection
pub fn create_default_client() -> Self {
    Self { timeout: Duration::from_secs(30) }
}

pub fn initialized() -> Self {
    Self::new()
}
```

**Builder vs Constructor:**

AST analysis distinguishes between constructors and builder methods:

```rust
// Constructor: creates new instance
pub fn new(timeout: u32) -> Self {
    Self { timeout }
}

// Builder method: modifies existing instance (NOT a constructor)
pub fn set_timeout(mut self, timeout: Duration) -> Self {
    self.timeout = timeout;
    self  // Returns modified self, not new instance
}
```

**Detection Criteria:**

A function is classified as a constructor if:
- Returns `Self`, `Result<Self>`, or `Option<Self>`
- Contains struct initialization (`Self { ... }`) without loops
- OR delegates to another constructor (`Self::new()`) with minimal logic

**Fallback Behavior:**

If AST parsing fails (syntax errors, unsupported language), Debtmap gracefully falls back to name-based detection (Spec 117):
- `new`, `new_*`
- `try_new*`
- `from_*`

This ensures analysis always completes, even on partially broken code.

**Performance:**

AST-based detection adds < 5% overhead compared to name-only detection. See benchmarks:

```bash
cargo bench --bench constructor_detection_bench
```

**Why it matters:**

Accurately identifying constructors helps:
- Exclude them from complexity thresholds (constructors naturally have high complexity)
- Focus refactoring on business logic, not initialization code
- Understand initialization patterns across the codebase