# Deep Context Analysis
## Overview
Deep Context Analysis provides comprehensive codebase understanding by combining multiple analysis dimensions into a unified report. It leverages AST parsing, complexity metrics, git history, and semantic analysis to create rich context for AI agents and developers.
## Features
- **Multi-Language AST Analysis**: Full parsing for Rust, TypeScript, Python, C/C++
- **Holistic Metrics**: Combines complexity, churn, dependencies, and quality
- **Incremental Updates**: Efficient re-analysis of changed files only
- **AI-Optimized Output**: Structured for LLM consumption
- **Caching**: Sub-second analysis for unchanged code
## Usage
### Command Line
```bash
# Basic deep context analysis (terse by default)
pmat analyze deep-context
# Full detailed analysis
pmat analyze deep-context --full
# Specific output format
pmat analyze deep-context --format json
pmat analyze deep-context --format markdown
pmat analyze deep-context --format sarif # SARIF format for CI/CD
# Analyze specific path
pmat analyze deep-context -p ./src
# Control analysis period
pmat analyze deep-context --period-days 90
# Select specific analyses to include
pmat analyze deep-context --include "complexity,churn,dependencies"
# Cache control
pmat analyze deep-context --cache-strategy normal # Default caching
pmat analyze deep-context --cache-strategy force-refresh # Force fresh analysis
pmat analyze deep-context --cache-strategy offline # Use cache only
# Save to file
pmat analyze deep-context --output deep-context.md
# Include/exclude patterns
pmat analyze deep-context --include-pattern "*.rs" --exclude-pattern "**/tests/**"
# Show more files in summary
pmat analyze deep-context --top-files 20
# Enable verbose output
pmat analyze deep-context --verbose
```
### Configuration
The deep context analysis can be configured through command-line options:
- **Output Format**: `--format` (markdown, json, sarif)
- **Report Detail**: `--full` for comprehensive analysis
- **Analysis Period**: `--period-days` for git history window
- **Include/Exclude Analyses**: `--include` and `--exclude` for specific analysis types
- **DAG Type**: `--dag-type` (call-graph, import-graph, inheritance, full-dependency)
- **Cache Strategy**: `--cache-strategy` (normal, force-refresh, offline)
- **File Patterns**: `--include-pattern` and `--exclude-pattern`
- **Parallelism**: `--parallel` for concurrent analysis
- **Top Files**: `--top-files` to control how many files to show in summary
Example with multiple options:
```bash
pmat analyze deep-context \
--full \
--format json \
--period-days 60 \
--include "complexity,churn,dependencies" \
--cache-strategy normal \
--parallel \
--top-files 15 \
--include-pattern "*.rs" \
--output analysis.json
```
## Output Structure
### Markdown Format
```markdown
# Deep Context Analysis
Generated: 2024-06-09 10:30:45 UTC
Project: paiml-mcp-agent-toolkit
Language: Rust (primary), TypeScript (secondary)
## Project Overview
### Structure
- Total Files: 298
- Lines of Code: 45,231
- Test Coverage: 92.3%
- Primary Language: Rust (78%)
### Key Components
1. **Core Engine** (`src/services/`)
- AST parsing and analysis
- Complexity calculation
- Cache management
2. **CLI Interface** (`src/cli/`)
- Command parsing
- Output formatting
- Interactive modes
### Complexity Analysis
#### Hotspots
| cli/mod.rs | handle_analyze_graph | 75 | 125 |
| services/analyzer.rs | process_ast | 45 | 67 |
### Recent Changes
#### High Churn Files (Last 30 days)
1. `src/services/refactor_engine.rs` - 23 commits, 3 authors
2. `src/cli/handlers/mod.rs` - 18 commits, 2 authors
### Dependencies
#### Direct Dependencies (Critical)
- tokio (1.35): Async runtime
- clap (4.4): CLI parsing
- serde (1.0): Serialization
### Quality Indicators
- Technical Debt Gradient: 1.45 (Moderate)
- Code Smells: 23 (15 high priority)
- Security Issues: 0
- Outdated Dependencies: 3
```
### JSON Format
```json
{
"metadata": {
"generated": "2024-06-09T10:30:45Z",
"version": "0.21.5",
"project_path": "/path/to/project"
},
"overview": {
"total_files": 298,
"total_lines": 45231,
"languages": {
"rust": { "files": 234, "percentage": 78.5 },
"typescript": { "files": 64, "percentage": 21.5 }
}
},
"structure": {
"components": [
{
"name": "Core Engine",
"path": "src/services",
"purpose": "Main analysis engine",
"complexity": "high",
"dependencies": ["tokio", "rayon", "dashmap"]
}
],
"architecture_style": "modular",
"entry_points": ["src/main.rs", "src/lib.rs"]
},
"complexity": {
"summary": {
"median_cyclomatic": 5,
"p90_cyclomatic": 20,
"max_cyclomatic": 75
},
"hotspots": [
{
"file": "cli/mod.rs",
"function": "handle_analyze_graph",
"cyclomatic": 75,
"cognitive": 125,
"recommendation": "Extract sub-functions"
}
]
},
"quality": {
"tdg_score": 1.45,
"test_coverage": 92.3,
"code_smells": 23,
"security_issues": 0
}
}
```
## Analysis Components
### 1. AST Analysis
Extracts structural information from source code:
- **Function signatures**: Parameters, return types, generics
- **Type definitions**: Structs, enums, traits/interfaces
- **Import graph**: Dependency relationships
- **Complexity metrics**: Per-function measurements
- **Documentation**: Extracted comments and docstrings
### 2. Complexity Analysis
Measures code complexity at multiple levels:
- **Function-level**: Cyclomatic and cognitive complexity
- **File-level**: Average and maximum complexity
- **Module-level**: Aggregate complexity scores
- **Trends**: Complexity changes over time
### 3. Git Churn Analysis
Analyzes version control history:
- **Change frequency**: Files modified most often
- **Author distribution**: Who modifies what
- **Coupled files**: Files that change together
- **Stability indicators**: Mature vs volatile code
### 4. Dependency Analysis
Maps internal and external dependencies:
- **Import graph**: Who depends on whom
- **Circular dependencies**: Architectural issues
- **External dependencies**: Version and security status
- **Unused dependencies**: Potential for cleanup
### 5. Quality Metrics
Comprehensive quality indicators:
- **Test coverage**: Line, branch, function coverage
- **Documentation coverage**: Public API documentation
- **Code smells**: Duplication, long methods, etc.
- **Technical debt**: Accumulated issues and TODOs
## AI Integration
### Optimized for LLMs
The output is structured for optimal LLM consumption:
1. **Hierarchical Structure**: Information organized from general to specific
2. **Context Windows**: Chunked for standard LLM context sizes
3. **Semantic Markers**: Clear section boundaries and relationships
4. **Actionable Insights**: Specific recommendations included
### Example AI Workflow
```python
import subprocess
import json
def get_project_context(project_path):
"""Generate deep context for AI analysis"""
# Run deep context analysis
result = subprocess.run([
'pmat', 'analyze', 'deep-context',
'--format', 'json',
'--path', project_path
], capture_output=True, text=True)
context = json.loads(result.stdout)
# Create AI prompt
prompt = f"""
Project Analysis:
- Language: {context['overview']['languages']}
- Complexity Hotspots: {len(context['complexity']['hotspots'])}
- Technical Debt: {context['quality']['tdg_score']}
Key Areas of Concern:
{format_hotspots(context['complexity']['hotspots'])}
Please suggest refactoring strategies for the top 3 complexity hotspots.
"""
return prompt
```
## Performance
### Benchmarks
| 10K LOC | 2.3s | 0.4s | 0.05s |
| 100K LOC | 18.5s | 2.1s | 0.08s |
| 1M LOC | 3m 45s | 15.2s | 0.12s |
### Optimization Strategies
1. **Parallel Processing**: Uses all CPU cores for AST parsing
2. **Incremental Analysis**: Only re-analyzes changed files
3. **Smart Caching**: Content-based cache invalidation
4. **Streaming Output**: Results available as computed
## Integration Examples
### CI/CD Pipeline
```yaml
name: Deep Context Analysis
on: [push, pull_request]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install PMAT
run: |
curl -sSfL https://raw.githubusercontent.com/paiml/paiml-mcp-agent-toolkit/master/scripts/install.sh | sh
- name: Run Deep Context Analysis
run: |
pmat analyze deep-context \
--format markdown \
--output DEEP_CONTEXT.md
- name: Upload Context
uses: actions/upload-artifact@v3
with:
name: deep-context
path: DEEP_CONTEXT.md
- name: Comment on PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
script: |
const fs = require('fs');
const context = fs.readFileSync('DEEP_CONTEXT.md', 'utf8');
// Extract summary section
const summary = context.match(/## Summary[\s\S]*?(?=##|$)/)[0];
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `### Deep Context Analysis\n${summary}`
});
```
### IDE Integration
```typescript
// vscode extension
import * as vscode from 'vscode';
import { execSync } from 'child_process';
export function activate(context: vscode.ExtensionContext) {
let disposable = vscode.commands.registerCommand('pmat.deepContext', () => {
const workspacePath = vscode.workspace.rootPath;
vscode.window.withProgress({
location: vscode.ProgressLocation.Notification,
title: "Generating Deep Context",
cancellable: false
}, async (progress) => {
progress.report({ increment: 0 });
const result = execSync(
`pmat analyze deep-context --format json`,
{ cwd: workspacePath, encoding: 'utf8' }
);
const context = JSON.parse(result);
// Create webview with results
const panel = vscode.window.createWebviewPanel(
'deepContext',
'Deep Context Analysis',
vscode.ViewColumn.One,
{}
);
panel.webview.html = generateHtml(context);
});
});
context.subscriptions.push(disposable);
}
```
## Best Practices
1. **Regular Updates**: Run analysis daily to track trends
2. **Baseline Comparison**: Compare against known good state
3. **Focus on Changes**: Use incremental mode for PRs
4. **Cache Wisely**: Configure cache based on team size
5. **Customize Output**: Tailor format to consumer needs
## Troubleshooting
### Common Issues
**Q: Analysis is taking too long**
A: Enable caching and use incremental mode. Exclude large generated files.
**Q: Memory usage is high**
A: Reduce `max_file_size` and enable streaming output.
**Q: Cache seems stale**
A: Clear cache with `rm -rf .pmat-cache` or reduce TTL.
### Debug Mode
```bash
# Enable detailed logging
RUST_LOG=debug pmat analyze deep-context
# Profile performance
pmat analyze deep-context --profile
# Validate cache
pmat analyze deep-context --validate-cache
```
## Future Roadmap
- **Semantic Code Search**: Natural language queries
- **Change Impact Analysis**: Predict effects of changes
- **Architecture Visualization**: Auto-generated diagrams
- **Team Analytics**: Developer productivity metrics
- **Real-time Monitoring**: Continuous analysis daemon