debtmap 0.16.4

Code complexity and technical debt analyzer
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
# Entropy Analysis

Entropy analysis is Debtmap's unique approach to distinguishing genuinely complex code from repetitive pattern-based code. This reduces false positives by 60-75% compared to traditional cyclomatic complexity metrics.

## Overview

Traditional static analysis tools flag code as "complex" based purely on cyclomatic complexity or lines of code. However, not all complexity is equal:

- **Repetitive patterns** (validation functions, dispatchers) have high cyclomatic complexity but low cognitive load
- **Diverse logic** (state machines, business rules) may have moderate cyclomatic complexity but high cognitive load

Entropy analysis uses information theory to distinguish between these cases.

## How It Works

Debtmap's entropy analysis is **language-agnostic**, working across Rust, Python, JavaScript, and TypeScript codebases using a universal token classification approach. This ensures consistent complexity assessment regardless of the programming language used.

### Language-Agnostic Analysis

The same entropy concepts apply consistently across all supported languages. Here's how a validation function would be analyzed in different languages:

**Rust:**
```rust
fn validate_config(config: &Config) -> Result<()> {
    if config.output_dir.is_none() { return Err(anyhow!("output_dir required")); }
    if config.max_workers.is_none() { return Err(anyhow!("max_workers required")); }
    if config.timeout_secs.is_none() { return Err(anyhow!("timeout_secs required")); }
    Ok(())
}
// Entropy: ~0.3, Pattern Repetition: 0.9, Effective Complexity: ~5
```

**Python:**
```python
def validate_config(config: Config) -> None:
    if config.output_dir is None: raise ValueError("output_dir required")
    if config.max_workers is None: raise ValueError("max_workers required")
    if config.timeout_secs is None: raise ValueError("timeout_secs required")
# Entropy: ~0.3, Pattern Repetition: 0.9, Effective Complexity: ~5
```

**JavaScript/TypeScript:**
```typescript
function validateConfig(config: Config): void {
    if (!config.outputDir) throw new Error("outputDir required");
    if (!config.maxWorkers) throw new Error("maxWorkers required");
    if (!config.timeoutSecs) throw new Error("timeoutSecs required");
}
// Entropy: ~0.3, Pattern Repetition: 0.9, Effective Complexity: ~5
```

All three receive similar entropy scores because they share the same repetitive validation pattern, demonstrating how Debtmap's analysis transcends language syntax to identify underlying code structure patterns.

### Shannon Entropy

Shannon entropy measures the variety and unpredictability of code patterns:

```
H(X) = -Σ p(x) × log₂(p(x))
```

Where:
- `p(x)` = probability of each token type
- High entropy (0.8-1.0) = many different patterns
- Low entropy (0.0-0.3) = repetitive patterns

### Token Classification

Debtmap classifies tokens by semantic importance to give more weight to meaningful code structures in entropy calculations. This is controlled by the `use_classification` configuration option.

**When enabled** (`use_classification = false` by default for backward compatibility), tokens are weighted by importance:

| Category | Weight | Examples |
|----------|--------|----------|
| **Control flow** | 1.2 | `if`, `match`, `for`, `while`, `loop`, `switch` |
| **Keywords** | 1.0 | `return`, `break`, `continue`, `async`, `await`, `unsafe` |
| **Function calls** | 0.9 | `foo()`, `bar.method()`, `map`, `filter`, `reduce` |
| **Operators** | 0.8 | `+`, `-`, `==`, `&&`, `?` (error propagation) |
| **Identifiers** | 0.5 | Variable names, field access |
| **Literals** | 0.3 | Numbers, strings, booleans |

**When disabled** (`use_classification = false`), all tokens are treated equally, which may be useful for debugging or when you want unweighted entropy scores.

#### Language-Specific Token Recognition

**Rust-specific tokens:**
- The `?` operator (error propagation) is tracked as an operator with weight 0.8
- Chains of `?` operators (e.g., `file.read()?.parse()?.validate()?`) are detected as repetitive patterns
- `unsafe` blocks are tracked with keyword weight 1.0

**JavaScript/TypeScript-specific tokens:**
- Arrow functions (`=>`) are tracked as keywords
- Method chains (`.map()`, `.filter()`, `.reduce()`) are tracked with function call weight 0.9
- Promise patterns (`.then()`, `.catch()`, `.finally()`) are recognized
- Async/await expressions are tracked with keyword weight 1.0

**JSX/React-specific tokens:**
- JSX elements (`<div>`, `<Component>`) are tracked as function calls (weight 0.9)
- Common React attributes (`className`, `onClick`, `onChange`, `key`, `ref`) are tracked
- JSX expressions (`{expression}`) are tracked as operators
- JSX fragments (`<>...</>`) are detected and tracked

### Pattern Repetition Detection

Detects repetitive structures in the AST:

```rust
// Low pattern repetition (0.2) - all branches identical
if a.is_none() { return Err(...) }
if b.is_none() { return Err(...) }
if c.is_none() { return Err(...) }

// High pattern repetition (0.9) - diverse branches
match state {
    Active => transition_to_standby(),
    Standby => transition_to_active(),
    Maintenance => schedule_restart(),
}
```

### Branch Similarity Analysis

Analyzes similarity between conditional branches:

```rust
// High branch similarity (0.9) - branches are nearly identical
if condition_a {
    log("A happened");
    process_a();
}
if condition_b {
    log("B happened");
    process_b();
}

// Low branch similarity (0.2) - branches are very different
if needs_auth {
    authenticate_user()?;
    load_profile()?;
} else {
    show_guest_ui();
}
```

### Effective Complexity Adjustment

Debtmap uses a multi-factor dampening approach that analyzes three dimensions of code repetitiveness:

1. **Pattern Repetition** - Detects repetitive AST structures
2. **Token Entropy** - Measures variety in token usage
3. **Branch Similarity** - Compares similarity between conditional branches

These factors are combined multiplicatively with a minimum floor of 0.7 (preserving at least 70% of original complexity):

```
dampening_factor = (repetition_factor × entropy_factor × branch_factor).max(0.7)
effective_complexity = raw_complexity × dampening_factor
```

#### Historical Note: Spec 68

**Spec 68: Graduated Entropy Dampening** was the original simple algorithm that only considered entropy < 0.2:

```
dampening_factor = 0.5 + 0.5 × (entropy / 0.2)  [when entropy < 0.2]
```

The current implementation uses a more sophisticated **graduated dampening** approach that considers all three factors (repetition, entropy, branch similarity) with separate thresholds and ranges for each. The test suite references Spec 68 to verify backward compatibility with the original behavior.

#### When Dampening Applies

Dampening is applied based on multiple thresholds:

- **Pattern Repetition**: Values approaching 1.0 trigger dampening (high repetition detected)
- **Token Entropy**: Values below 0.4 trigger graduated dampening (low variety)
- **Branch Similarity**: Values above 0.8 trigger dampening (similar branches)

#### Graduated Dampening Formula

Each factor is dampened individually using a graduated calculation:

```rust
// Conceptual pseudocode showing the three-factor approach
// Actual implementation in src/complexity/entropy.rs:185-209
fn calculate_dampening_factor(
    repetition: f64,        // 0.0-1.0
    entropy: f64,           // 0.0-1.0
    branch_similarity: f64, // 0.0-1.0
    config: &EntropyConfig
) -> f64 {
    // Each factor uses calculate_graduated_dampening with configurable thresholds
    let repetition_factor = graduated_dampening(
        repetition, config.pattern_threshold, config.max_repetition_reduction
    );
    let entropy_factor = graduated_dampening(
        entropy, config.entropy_threshold, config.max_entropy_reduction
    );
    let branch_factor = graduated_dampening(
        branch_similarity, config.branch_threshold, config.max_branch_reduction
    );

    let floor = 1.0 - config.max_combined_reduction;
    (repetition_factor * entropy_factor * branch_factor).max(floor)
}
```

**Key Parameters (all configurable):**
- **Repetition**: Threshold via `pattern_threshold` (default: 0.7), max reduction via `max_repetition_reduction` (default: 0.20)
- **Entropy**: Threshold via `entropy_threshold` (default: 0.4), max reduction via `max_entropy_reduction` (default: 0.15)
- **Branch Similarity**: Threshold via `branch_threshold` (default: 0.8), max reduction via `max_branch_reduction` (default: 0.25)
- **Combined Floor**: Minimum preserved via `max_combined_reduction` (default: 0.30, preserving 70%)

#### Example: Repetitive Validation Function

```
Raw Complexity: 20
Pattern Repetition: 0.95 (very high)
Token Entropy: 0.3 (low variety)
Branch Similarity: 0.9 (very similar branches)

repetition_factor ≈ 0.85 (15% reduction)
entropy_factor ≈ 0.90 (10% reduction)
branch_factor ≈ 0.80 (20% reduction)

dampening_factor = (0.85 × 0.90 × 0.80) = 0.612
dampening_factor = max(0.612, 0.7) = 0.7  // Floor applied

Effective Complexity = 20 × 0.7 = 14

Result: 30% reduction (maximum allowed)
```

#### Example: Diverse State Machine

```
Raw Complexity: 20
Pattern Repetition: 0.2 (low - not repetitive)
Token Entropy: 0.8 (high variety)
Branch Similarity: 0.3 (diverse branches)

repetition_factor ≈ 1.0 (no reduction)
entropy_factor ≈ 1.0 (no reduction)
branch_factor ≈ 1.0 (no reduction)

dampening_factor = (1.0 × 1.0 × 1.0) = 1.0

Effective Complexity = 20 × 1.0 = 20

Result: 0% reduction (complexity preserved)
```

## Real-World Examples

### Example 1: Validation Function

```rust
fn validate_config(config: &Config) -> Result<()> {
    if config.output_dir.is_none() {
        return Err(anyhow!("output_dir required"));
    }
    if config.max_workers.is_none() {
        return Err(anyhow!("max_workers required"));
    }
    if config.timeout_secs.is_none() {
        return Err(anyhow!("timeout_secs required"));
    }
    // ... 17 more similar checks
    Ok(())
}
```

**Traditional analysis:**
- Cyclomatic Complexity: 20
- Assessment: CRITICAL

**Entropy analysis:**
- Shannon Entropy: 0.3 (low variety)
- Pattern Repetition: 0.9 (highly repetitive)
- Branch Similarity: 0.95 (nearly identical)
- Effective Complexity: 5
- Assessment: LOW PRIORITY

### Example 2: State Machine Logic

```rust
fn reconcile_state(current: &State, desired: &State) -> Vec<Action> {
    let mut actions = vec![];

    match (current.mode, desired.mode) {
        (Mode::Active, Mode::Standby) => {
            if current.has_active_connections() {
                actions.push(Action::DrainConnections);
                actions.push(Action::WaitForDrain);
            }
            actions.push(Action::TransitionToStandby);
        }
        (Mode::Standby, Mode::Active) => {
            if desired.requires_warmup() {
                actions.push(Action::Warmup);
            }
            actions.push(Action::TransitionToActive);
        }
        // ... more diverse state transitions
        _ => {}
    }

    actions
}
```

**Traditional analysis:**
- Cyclomatic Complexity: 8
- Assessment: MODERATE

**Entropy analysis:**
- Shannon Entropy: 0.85 (high variety)
- Pattern Repetition: 0.2 (not repetitive)
- Branch Similarity: 0.3 (diverse branches)
- Effective Complexity: 9
- Assessment: HIGH PRIORITY

## Configuration

Configure entropy analysis in `.debtmap.toml` or disable via the `--semantic-off` CLI flag.

```toml
[entropy]
# Enable entropy analysis (default: true)
enabled = true

# Weight of entropy in overall complexity scoring (0.0-1.0, default: 1.0)
# Note: This affects scoring, not dampening thresholds
weight = 1.0

# Minimum tokens required for entropy calculation (default: 20)
min_tokens = 20

# Pattern similarity threshold for repetition detection (0.0-1.0, default: 0.7)
pattern_threshold = 0.7

# Enable advanced token classification (default: false for backward compatibility)
# When true, weights tokens by semantic importance (control flow > operators > identifiers)
use_classification = false

# Token entropy threshold (0.0-1.0, default: 0.4)
# Values below this threshold trigger graduated dampening
entropy_threshold = 0.4

# Branch similarity threshold (0.0-1.0, default: 0.8)
# Branches with similarity above this threshold contribute to dampening
branch_threshold = 0.8

# Maximum reduction limits (these are configurable)
max_repetition_reduction = 0.20  # Max 20% reduction from pattern repetition
max_entropy_reduction = 0.15     # Max 15% reduction from low token entropy
max_branch_reduction = 0.25      # Max 25% reduction from branch similarity
max_combined_reduction = 0.30    # Overall cap at 30% reduction (minimum 70% preserved)
```

**Important Notes:**

1. **All dampening thresholds are now configurable** (`src/complexity/entropy.rs:185-209`):
   - **Entropy factor threshold: 0.4** - Configurable via `entropy_threshold` in config file. Values below this threshold trigger graduated dampening. The default 0.4 was chosen based on empirical analysis across multiple codebases to balance false positive reduction with sensitivity to genuinely complex code.
   - **Branch threshold: 0.8** - Configurable via `branch_threshold` in config file
   - **Pattern threshold: 0.7** - Configurable via `pattern_threshold` in config file

2. **The `weight` parameter** affects how entropy scores contribute to overall complexity scoring, but does not change the dampening thresholds or reductions.

3. **Token classification** defaults to `false` (disabled) for backward compatibility, even though it provides more accurate entropy analysis when enabled.

### Tuning for Your Project

**Enable token classification for better accuracy:**
```toml
[entropy]
enabled = true
use_classification = true  # Weight control flow keywords more heavily
```

**Strict mode (fewer reductions, flag more code):**
```toml
[entropy]
enabled = true
max_repetition_reduction = 0.10  # Reduce from default 0.20
max_entropy_reduction = 0.08     # Reduce from default 0.15
max_branch_reduction = 0.12      # Reduce from default 0.25
max_combined_reduction = 0.20    # Reduce from default 0.30 (preserve 80%)
```

**Lenient mode (more aggressive reduction):**
```toml
[entropy]
enabled = true
max_repetition_reduction = 0.30  # Increase from default 0.20
max_entropy_reduction = 0.25     # Increase from default 0.15
max_branch_reduction = 0.35      # Increase from default 0.25
max_combined_reduction = 0.50    # Increase from default 0.30 (preserve 50%)
```

**Disable entropy dampening entirely:**
```toml
[entropy]
enabled = false
```

Or via CLI (disables entropy-based complexity adjustments):
```bash
# Disables semantic analysis features including entropy dampening
debtmap analyze . --semantic-off
```

**Note**: The `--semantic-off` flag disables all semantic analysis features, including entropy-based complexity adjustments. This is useful when you want raw cyclomatic complexity without any dampening.

## Interpreting Entropy-Adjusted Output

When entropy analysis detects repetitive patterns, debtmap displays both the original and adjusted complexity values to help you understand the adjustment. This transparency allows you to verify the analysis and understand why certain code receives lower priority.

### Output Format

When viewing detailed output (verbosity level 2 with `-vv`), entropy-adjusted complexity is shown in the **COMPLEXITY** section:

```
COMPLEXITY: cyclomatic=20 (dampened: 14, factor: 0.70), est_branches=40, cognitive=25, nesting=3, entropy=0.30
```

And in the **Entropy Impact** scoring section:

```
  - Entropy Impact: 30% dampening (entropy: 0.30, repetition: 95%)
```

### Understanding the Values

**cyclomatic=20**: Original cyclomatic complexity before adjustment
**dampened: 14**: Adjusted complexity after entropy analysis (20 × 0.70 = 14)
**factor: 0.70**: The dampening factor applied (0.70 = 30% reduction)
**entropy=0.30**: Shannon entropy score (0.0-1.0, lower = more repetitive)
**repetition: 95%**: Pattern repetition score (higher = more repetitive)

### Reconstructing the Calculation

You can verify the adjustment by multiplying:
```
original_complexity × dampening_factor = adjusted_complexity
20 × 0.70 = 14
```

The dampening percentage shown in the Entropy Impact section is:
```
dampening_percentage = (1.0 - dampening_factor) × 100%
(1.0 - 0.70) × 100% = 30%
```

### When Entropy Data is Unavailable

If a function is too small for entropy analysis (< 20 tokens) or entropy is disabled, the output shows complexity without dampening:

```
COMPLEXITY: cyclomatic=5, est_branches=10, cognitive=8, nesting=2
```

No "dampened" or "factor" values are shown, indicating the raw complexity is used for scoring.

### Example Output Comparison

**Before entropy-adjustment:**
```
#1 SCORE: 95.5 [CRITICAL]
├─ COMPLEXITY: cyclomatic=20, est_branches=40, cognitive=25, nesting=3
```

**After entropy-adjustment:**
```
#15 SCORE: 68.2 [HIGH]
├─ COMPLEXITY: cyclomatic=20 (dampened: 14, factor: 0.70), est_branches=40, cognitive=25, nesting=3, entropy=0.30
  - Entropy Impact: 30% dampening (entropy: 0.30, repetition: 95%)
```

The item dropped from rank #1 to #15 because entropy analysis detected the high complexity was primarily due to repetitive validation patterns rather than genuine cognitive complexity.

## Understanding the Impact

### Measuring False Positive Reduction

Run analysis with and without entropy:

```bash
# Without entropy
debtmap analyze . --semantic-off --top 20 > without_entropy.txt

# With entropy (default)
debtmap analyze . --top 20 > with_entropy.txt

# Compare
diff without_entropy.txt with_entropy.txt
```

**Expected results:**
- 60-75% reduction in flagged validation functions
- 40-50% reduction in flagged dispatcher functions
- 20-30% reduction in flagged configuration parsers
- No reduction in genuinely complex state machines or business logic

### Verifying Correctness

Entropy analysis should:
- **Reduce** flags on repetitive code (validators, dispatchers)
- **Preserve** flags on genuinely complex code (state machines, business logic)

If entropy analysis incorrectly reduces flags on genuinely complex code, adjust configuration:

```toml
[entropy]
max_combined_reduction = 0.20  # Reduce from default 0.30 (preserve 80%)
max_repetition_reduction = 0.10  # Reduce individual factors
max_entropy_reduction = 0.08
max_branch_reduction = 0.12
```

## Best Practices

1. **Use default settings** - They work well for most projects
2. **Verify results** - Spot-check top-priority items to ensure correctness
3. **Tune conservatively** - Start with default settings, adjust if needed
4. **Disable for debugging** - Use `--semantic-off` if entropy seems incorrect
5. **Report issues** - If entropy incorrectly flags code, report it

## Limitations

Entropy analysis works best for:
- Functions with cyclomatic complexity 10-50
- Code with clear repetitive patterns
- Validation, dispatch, and configuration functions

Entropy analysis is less effective for:
- Very simple functions (complexity < 5)
- Very complex functions (complexity > 100)
- Obfuscated or generated code

## Comparison with Other Approaches

| Approach | False Positive Rate | Complexity | Speed |
|----------|---------------------|------------|-------|
| Raw Cyclomatic Complexity | High (many false positives) | Low | Fast |
| Cognitive Complexity | Medium | Medium | Medium |
| Entropy Analysis (Debtmap) | Low | High | Fast |
| Manual Code Review | Very Low | Very High | Very Slow |

Debtmap's entropy analysis provides the best balance of accuracy and speed.

## See Also

- [Why Debtmap?]why-debtmap.md - Real-world examples of entropy analysis
- [Analysis Guide]analysis-guide/index.md - General analysis concepts
- [Configuration]configuration.md - Complete configuration reference