# Semantic Classification
Debtmap performs semantic analysis to classify functions by their architectural role, enabling more accurate complexity scoring and prioritization.
## Overview
Semantic classification identifies the purpose of each function based on AST patterns, helping debtmap:
- Apply appropriate complexity expectations
- Adjust scoring based on function role
- Provide role-specific recommendations
## Function Roles
Debtmap classifies functions into seven distinct roles, each with specific detection criteria and scoring behavior.
### Pure Logic
Functions that compute without side effects. These are the core business logic functions that deserve highest test priority.
**Detection Criteria** (from `src/priority/semantic_classifier/mod.rs:43`):
- Default classification when no other role matches
- Does not match entry point, debug, constructor, enum converter, accessor, pattern matching, I/O wrapper, or orchestrator patterns
**Example:**
```rust
fn calculate_total(items: &[Item]) -> u32 {
items.iter().map(|i| i.price).sum()
}
```
### Orchestrator
Functions that coordinate other functions with simple delegation logic.
**Detection Criteria** (from `src/priority/semantic_classifier/classifiers.rs:257-328`):
- Name matches orchestration patterns: `orchestrate`, `coordinate`, `manage`, `dispatch`, `route`, `delegate`, `forward`
- Name prefixes: `workflow_`, `pipeline_`, `process_`, `orchestrate_`, `coordinate_`, `execute_flow_`
- Must have at least 2 meaningful callees (non-stdlib functions)
- Cyclomatic complexity ≤ 5
- Delegation ratio ≥ 20% (function calls / total lines)
- Excludes adapter/wrapper patterns (single delegation)
**Example:**
```rust
fn process_order(order: Order) -> Result<Receipt> {
let validated = validate_order(&order)?;
let priced = calculate_prices(&validated)?;
finalize_order(&priced)
}
```
### I/O Wrapper
Functions that wrap I/O operations. Includes simple constructors, accessors, and enum converters.
**Detection Criteria** (from `src/priority/semantic_classifier/classifiers.rs:331-343`):
- Name contains I/O keywords: `read`, `write`, `file`, `socket`, `http`, `request`, `response`, `stream`, `serialize`, `deserialize`, `save`, `load`, etc.
- Short I/O functions (< 20 lines) are always classified as I/O wrappers
- Longer functions (≤ 50 lines) with strong I/O name patterns (`output_`, `write_`, `print_`, etc.) and low nesting (≤ 3)
**Example:**
```rust
fn read_config(path: &Path) -> Result<Config> {
let content = fs::read_to_string(path)?;
toml::from_str(&content)
}
```
### Entry Point
Main functions and public API endpoints. These have highest classification precedence.
**Detection Criteria** (from `src/priority/semantic_classifier/pattern_matchers.rs:54-63`):
- Name patterns: `main`, `run`, `start`, `init`, `handle`, `process`, `execute`, `serve`, `listen`
- Functions at the top of the call graph
- Highest classification precedence (checked before all other roles)
**Example:**
```rust
fn main() {
let args = Args::parse();
run(args).unwrap();
}
```
### Pattern Match
Functions dominated by pattern matching logic, typically with many branches but low cyclomatic complexity.
**Detection Criteria** (from `src/priority/semantic_classifier/classifiers.rs:213-252`):
- Name suggests pattern matching: `detect`, `classify`, `identify`, `determine`, `resolve`, `match`, `parse_type`, `get_type`, `find_type`
- Low cyclomatic complexity (≤ 2) but higher cognitive complexity
- Cognitive/cyclomatic ratio > 5.0 (indicates many if/else or match branches)
**Example:**
```rust
fn handle_event(event: Event) -> Action {
match event {
Event::Click(pos) => Action::Select(pos),
Event::Drag(from, to) => Action::Move(from, to),
Event::Release => Action::Confirm,
}
}
```
### Debug
Functions used for troubleshooting and diagnostics. These have the lowest test priority.
**Detection Criteria** (from `src/priority/semantic_classifier/classifiers.rs:14-53`):
- Name prefixes: `debug_`, `print_`, `dump_`, `trace_`
- Name suffixes: `_diagnostics`, `_debug`, `_stats`
- Name contains: `diagnostics`
- Cognitive complexity ≤ 10 (prevents misclassifying complex functions with debug-like names)
- Alternatively: Very simple functions (cognitive < 5, length < 20) with output-focused I/O patterns (`print`, `display`, `show`, `log`, `trace`, `dump`)
**Example:**
```rust
fn print_call_graph_diagnostics(graph: &CallGraph) {
for node in graph.nodes() {
println!("{}: {} callers, {} callees",
node.name, node.callers.len(), node.callees.len());
}
}
```
### Unknown
Functions that cannot be classified into any specific role. These receive neutral scoring adjustments.
**Detection Criteria** (from `src/priority/semantic_classifier/mod.rs:32`):
- Reserved for edge cases where classification fails
- In practice, functions default to `PureLogic` when no other role matches
## Classification Precedence
The classifier applies rules in a specific order to ensure correct classification when multiple patterns match (from `src/priority/semantic_classifier/mod.rs:47-113`):
1. **Entry Point** - Highest precedence, checked first
2. **Debug** - Diagnostic functions detected early
3. **Constructor** - Simple constructors (AST-based detection, spec 117/122)
4. **Enum Converter** - Simple enum-to-value converters (spec 124)
5. **Accessor** - Simple getter/accessor methods (spec 125)
6. **Data Flow** - Data transformation orchestrators (spec 126, opt-in)
7. **Pattern Match** - Pattern matching functions
8. **I/O Wrapper** - I/O-focused functions
9. **Orchestrator** - Coordination functions
10. **Pure Logic** - Default fallback
## AST-Based Detection
Semantic classification uses AST analysis to detect function roles beyond simple name matching.
### Constructor Detection (Spec 117/122)
**Source:** `src/priority/semantic_classifier/classifiers.rs:115-183`
Detects constructor functions even with non-standard names by analyzing:
- Return type: Must return `Self`, `Result<Self>`, or `Option<Self>`
- Body patterns: Contains struct initialization with `Self { ... }`
- No loops in function body
- Complexity thresholds: cyclomatic ≤ 5, nesting ≤ 2, length < 30
**Detected Patterns:**
- Standard names: `new`, `default`, `from_*`, `with_*`, `create_*`, `make_*`, `build_*`
- Non-standard names with AST analysis: `create_default_client()` returning `Self`
**Example:**
```rust
// Detected even without standard naming
pub fn create_default_client() -> Self {
Self {
timeout: Duration::from_secs(30),
retries: 3,
}
}
```
### Enum Converter Detection (Spec 124)
**Source:** `src/priority/semantic_classifier/classifiers.rs:185-211`
Detects simple enum-to-string converter functions:
- Name patterns: `name`, `as_str`, `to_*`
- Body contains exhaustive `match` statement on `self`
- All match arms return string/numeric literals only
- No function calls in match arms (e.g., no `format!()`)
- Cognitive complexity ≤ 3
**Example:**
```rust
pub fn name(&self) -> &'static str {
match self {
FrameworkType::Django => "Django",
FrameworkType::Flask => "Flask",
FrameworkType::PyQt => "PyQt",
}
}
```
### Accessor Method Detection (Spec 125)
**Source:** `src/priority/semantic_classifier/mod.rs:121-177`
Detects simple accessor and getter methods:
**Single-word patterns:**
- `id`, `name`, `value`, `kind`, `type`, `status`, `code`, `key`, `index`
**Prefix patterns:**
- `get_*`, `is_*`, `has_*`, `can_*`, `should_*`, `as_*`, `to_*`, `into_*`
**Complexity thresholds:**
- Cyclomatic complexity ≤ 2
- Cognitive complexity ≤ 1
- Length < 10 lines
- Nesting ≤ 1 level
- If AST available, verifies body is simple accessor pattern
**Example:**
```rust
pub fn id(&self) -> u32 {
self.id
}
```
### Data Flow Classification (Spec 126)
**Source:** `src/priority/semantic_classifier/mod.rs:81-96`
Analyzes data flow patterns to identify orchestration functions based on transformation chains.
**Detection Criteria:**
- Enabled via configuration (opt-in by default)
- High confidence (≥ 0.8)
- High transformation ratio (≥ 0.7)
- Low business logic ratio (< 0.3)
### Debug Function Detection (Spec 119)
**Source:** `src/priority/semantic_classifier/pattern_matchers.rs:7-20`
Detects debug/diagnostic functions using:
**Name patterns:**
- Prefixes: `debug_`, `print_`, `dump_`, `trace_`
- Suffixes: `_diagnostics`, `_debug`, `_stats`
- Contains: `diagnostics`
**Behavioral characteristics:**
- Low cognitive complexity (< 5)
- Short length (< 20 lines)
- Output-focused I/O patterns: `print`, `display`, `show`, `log`, `trace`, `dump`
## Role-Specific Expectations
Different roles have different coverage and complexity expectations:
| Pure Logic | High | Low |
| Orchestrator | Medium | Medium |
| I/O Wrapper | Low | Low |
| Entry Point | Low | Medium |
| Pattern Match | Medium | Variable |
| Debug | Low | Low |
| Unknown | Medium | Medium |
## Scoring Adjustments
Semantic classification affects scoring through role multipliers. These values adjust the priority score for each function role (from `src/config/scoring.rs:307-333`):
```toml
[scoring.role_multipliers]
pure_logic = 1.2 # Prioritized (highest test priority)
orchestrator = 0.8 # Reduced priority
io_wrapper = 0.7 # Minor reduction
entry_point = 0.9 # Slight reduction
pattern_match = 0.6 # Moderate reduction
debug = 0.3 # Lowest test priority
unknown = 1.0 # No adjustment
```
**Scoring Formula:**
- Higher multipliers (> 1.0) increase function priority
- Lower multipliers (< 1.0) decrease function priority
- `pure_logic = 1.2` means pure logic functions are prioritized 20% higher
- `debug = 0.3` means debug functions are de-prioritized significantly
## Configuration
### Semantic Classification
```toml
[semantic]
enabled = true
role_detection = true
adjust_coverage_expectations = true
```
### Constructor Detection (Spec 117/122)
From `src/config/detection.rs:54-98`:
```toml
[classification.constructors]
# Name patterns for constructor functions
patterns = ["new", "default", "from_", "with_", "create_", "make_", "build_", "of_", "empty", "zero", "any"]
# Complexity thresholds
max_cyclomatic = 2 # Maximum cyclomatic complexity
max_cognitive = 3 # Maximum cognitive complexity
max_length = 15 # Maximum function length
max_nesting = 1 # Maximum nesting depth
# Enable AST-based detection for non-standard constructor names
ast_detection = true # Analyzes return types and body patterns
```
### Accessor Detection (Spec 125)
From `src/config/detection.rs:135-226`:
```toml
[classification.accessors]
enabled = true
# Single-word accessor names
single_word_patterns = ["id", "name", "value", "kind", "type", "status", "code", "key", "index"]
# Prefix patterns for accessors
prefix_patterns = ["get_", "is_", "has_", "can_", "should_", "as_", "to_", "into_"]
# Complexity thresholds
max_cyclomatic = 2 # Maximum cyclomatic complexity
max_cognitive = 1 # Maximum cognitive complexity (stricter than constructors)
max_length = 10 # Maximum function length
max_nesting = 1 # Maximum nesting depth
```
### Data Flow Classification (Spec 126)
From `src/config/detection.rs:228-273`:
```toml
[classification.data_flow]
enabled = false # Opt-in feature
min_confidence = 0.8 # Minimum confidence required
min_transformation_ratio = 0.7 # Minimum transformation ratio for orchestrator
max_business_logic_ratio = 0.3 # Maximum business logic for orchestrator
```
### Debug Function Detection
Debug function detection is controlled by name patterns in `src/priority/semantic_classifier/pattern_matchers.rs:7-20`. The detection thresholds are:
- Cognitive complexity ≤ 10 for name-matched functions
- Cognitive complexity < 5 and length < 20 for behavior-matched functions
## Troubleshooting
### Function Classified Incorrectly
If a function is classified with the wrong role:
1. **Check classification precedence** - Entry points take highest precedence
2. **Review complexity thresholds** - High complexity can disqualify certain roles
3. **Examine name patterns** - Some roles require specific naming conventions
4. **Enable AST detection** - Set `classification.constructors.ast_detection = true` for better constructor detection
### Constructor Not Detected
If a simple constructor is classified as PureLogic:
1. Ensure function name matches patterns or returns `Self`
2. Check complexity thresholds: cyclomatic ≤ 2, cognitive ≤ 3, length < 15
3. Enable AST detection for non-standard names
4. Verify no loops in function body
### Debug Function Not Detected
If a diagnostic function has high priority:
1. Ensure name matches debug patterns (`debug_*`, `print_*`, `*_diagnostics`, etc.)
2. Check cognitive complexity is ≤ 10
3. Functions with high complexity are intentionally excluded to prevent misclassification
## See Also
- [Role-Based Adjustments](scoring-strategies/role-based.md)
- [Functional Composition Analysis](functional-analysis.md)