# Valknut Architecture Documentation
This document describes the architecture, design patterns, and implementation details of the Valknut code analysis system.
## Table of Contents
- [Overview](#overview)
- [System Architecture](#system-architecture)
- [Core Components](#core-components)
- [Analysis Pipeline](#analysis-pipeline)
- [Language Support](#language-support)
- [Data Flow](#data-flow)
- [Performance Considerations](#performance-considerations)
- [Extension Points](#extension-points)
- [Design Decisions](#design-decisions)
## Overview
Valknut is a high-performance code analysis tool implemented in Rust that provides comprehensive analysis capabilities including:
- **Structure Analysis**: Directory organization and file distribution assessment
- **Complexity Analysis**: AST-based complexity metrics using Tree-sitter parsers
- **Semantic Naming**: AI-powered function and variable name quality evaluation
- **Technical Debt Assessment**: Quantitative debt scoring and prioritization
- **Refactoring Recommendations**: Actionable improvement suggestions with impact analysis
- **Quality Gates**: CI/CD integration with configurable failure conditions
## System Architecture
### High-Level Architecture
```
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ CLI Layer │ │ API Layer │ │ Configuration │
│ (bin/valknut.rs) │◄──►│ (api/*.rs) │◄──►│ (valknut.yml) │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Core Analysis Pipeline │
│ (core/pipeline.rs) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Detector Modules │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Structure │ │ Complexity │ │ Refactoring │ │ Names │ │
│ │ Analysis │ │ Analysis │ │ Analysis │ │ Analysis │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Language Adapters │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Python │ │ TypeScript │ │ Rust │ │ Go │ │
│ │ Parser │ │ Parser │ │ Parser │ │ Parser │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ I/O Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Reports │ │ Cache │ │ Persistence │ │ Config │ │
│ │ Generation │ │ System │ │ Layer │ │ Management │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Core Components
### 1. Analysis Pipeline (`core/pipeline.rs`)
The central orchestrator that coordinates all analysis activities:
```rust
pub struct AnalysisPipeline {
config: AnalysisConfig,
complexity_analyzer: ComplexityAnalyzer,
structure_extractor: StructureExtractor,
refactoring_analyzer: RefactoringAnalyzer,
}
impl AnalysisPipeline {
pub async fn analyze_paths(
&self,
paths: &[PathBuf],
progress_callback: Option<ProgressCallback>,
) -> Result<ComprehensiveAnalysisResult>
}
```
**Key Responsibilities:**
- File discovery and filtering
- Coordinating detector execution
- Progress tracking and reporting
- Result aggregation and health metrics calculation
- Quality gate evaluation
### 2. Detector Modules (`detectors/`)
#### Structure Analyzer (`detectors/structure.rs`)
- Directory organization analysis
- File size and distribution assessment
- Reorganization recommendations
#### Complexity Analyzer (`detectors/complexity.rs`)
- AST-based complexity metrics
- Cyclomatic and cognitive complexity
- Maintainability index calculation
#### Semantic Naming Analyzer (`detectors/names/`)
- AI-powered name quality assessment
- Function and variable naming evaluation
- Renaming suggestions with context
#### Refactoring Analyzer (`detectors/refactoring.rs`)
- Code smell detection
- Improvement opportunity identification
- Impact assessment and prioritization
### 3. Language Adapters (`lang/`)
Language-specific parsers and AST analyzers using Tree-sitter:
```rust
pub trait LanguageAdapter {
fn parse_file(&self, content: &str) -> Result<Tree>;
fn extract_entities(&self, tree: &Tree) -> Vec<CodeEntity>;
fn calculate_complexity(&self, node: &Node) -> ComplexityMetrics;
fn analyze_names(&self, entities: &[CodeEntity]) -> Vec<NamingIssue>;
}
```
**Supported Languages:**
- **Python** (`lang/python.rs`) - Full support
- **TypeScript** (`lang/typescript.rs`) - Full support
- **JavaScript** (`lang/javascript.rs`) - Full support
- **Rust** (`lang/rust_lang.rs`) - Full support
- **Go** (`lang/go.rs`) - Experimental
### 4. I/O Layer (`io/`)
#### Report Generation (`io/reports.rs`)
- Multiple output format support (JSON, HTML, Markdown, CSV)
- Template-based report generation
- Interactive dashboard creation
#### Caching System (`io/cache.rs`)
- File-based analysis result caching
- Incremental analysis support
- Cache invalidation strategies
#### Configuration Management (`core/config.rs`)
- YAML configuration parsing and validation
- Default value management
- CLI option integration
## Analysis Pipeline
### 1. File Discovery Phase
```rust
async fn discover_files(&self, paths: &[PathBuf]) -> Result<Vec<PathBuf>> {
// 1. Traverse input paths
// 2. Filter by file extensions
// 3. Exclude specified directories
// 4. Apply file limits if configured
}
```
**Filtering Rules:**
- Include files matching configured extensions
- Exclude directories like `node_modules`, `target`, `.git`
- Respect max file limits for large codebases
- Handle both files and directories as input
### 2. Structure Analysis Phase
```rust
async fn run_structure_analysis(&self, paths: &[PathBuf]) -> Result<StructureAnalysisResults> {
// 1. Analyze directory organization
// 2. Identify overcrowded directories
// 3. Detect large files needing splitting
// 4. Generate reorganization recommendations
}
```
**Analysis Types:**
- **Directory Pressure**: Files/LOC per directory thresholds
- **File Size Analysis**: Large file identification and splitting suggestions
- **Balance Analysis**: Code distribution assessment
- **Partitioning Recommendations**: Structural improvement suggestions
### 3. Complexity Analysis Phase
```rust
async fn run_complexity_analysis(&self, files: &[PathBuf]) -> Result<ComplexityAnalysisResults> {
// 1. Parse files using Tree-sitter
// 2. Extract code entities (functions, classes)
// 3. Calculate complexity metrics
// 4. Identify complexity hotspots
}
```
**Metrics Calculated:**
- **Cyclomatic Complexity**: Decision point counting
- **Cognitive Complexity**: Human perception-based complexity
- **Technical Debt Score**: Quantified maintainability assessment
- **Maintainability Index**: Composite maintainability score
### 4. Semantic Analysis Phase
```rust
async fn run_semantic_analysis(&self, files: &[PathBuf]) -> Result<SemanticAnalysisResults> {
// 1. Extract function and variable names
// 2. Analyze naming quality using embedding models
// 3. Detect naming inconsistencies
// 4. Generate renaming suggestions
}
```
**AI Integration:**
- **Embedding Models**: Qwen3-Embedding-0.6B for semantic understanding
- **Mismatch Detection**: Contextual name quality assessment
- **Suggestion Generation**: Contextual renaming recommendations
- **API Protection**: Preserve public API naming conventions
### 5. Refactoring Analysis Phase
```rust
async fn run_refactoring_analysis(&self, files: &[PathBuf]) -> Result<RefactoringAnalysisResults> {
// 1. Detect code smells and anti-patterns
// 2. Identify improvement opportunities
// 3. Calculate impact and effort estimates
// 4. Prioritize recommendations
}
```
**Opportunities Identified:**
- Extract Method opportunities
- Extract Class candidates
- Reduce complexity suggestions
- Remove duplication recommendations
### 6. Health Metrics Calculation
```rust
fn calculate_health_metrics(&self, ...) -> HealthMetrics {
// 1. Aggregate analysis results
// 2. Calculate composite scores
// 3. Normalize metrics to 0-100 scale
// 4. Weighted health score calculation
}
```
**Health Score Components:**
- **Maintainability Score** (30% weight): Based on maintainability index
- **Structure Quality Score** (30% weight): Based on structural issues
- **Complexity Score** (20% weight): Inverse of complexity metrics
- **Technical Debt Score** (20% weight): Inverse of debt ratio
## Language Support
### Tree-sitter Integration
Valknut uses Tree-sitter for robust, language-agnostic parsing:
```rust
use tree_sitter::{Language, Parser, Tree};
pub struct LanguageParser {
language: Language,
parser: Parser,
}
impl LanguageParser {
pub fn parse(&mut self, source: &str) -> Option<Tree> {
self.parser.parse(source, None)
}
}
```
**Benefits:**
- **Error Recovery**: Robust parsing of incomplete/malformed code
- **Language Agnostic**: Uniform AST structure across languages
- **Performance**: Fast parsing with minimal memory usage
- **Incremental**: Support for incremental parsing (future)
### Language-Specific Implementations
#### Python Support (`lang/python.rs`)
- Function, class, and method extraction
- Import and module analysis
- Python-specific complexity patterns
- Django/Flask framework awareness
#### TypeScript/JavaScript Support (`lang/typescript.rs`, `lang/javascript.rs`)
- Modern syntax support (ES2024, TypeScript 5.x)
- React component analysis
- Module system understanding
- Node.js and browser pattern recognition
#### Rust Support (`lang/rust_lang.rs`)
- Trait and implementation analysis
- Ownership and borrowing pattern recognition
- Cargo project structure awareness
- Async/await pattern analysis
## Data Flow
### Input Processing
```
CLI Args → Configuration → Path Discovery → File Filtering
```
### Analysis Flow
```
Files → Language Detection → AST Parsing → Entity Extraction →
Metric Calculation → Issue Detection → Recommendation Generation
```
### Output Generation
```
Analysis Results → Format Selection → Template Processing →
Report Generation → File Output
```
### Caching Flow
```
Input Hash → Cache Check → [Cache Hit: Return | Cache Miss: Analyze → Cache Store]
```
## Performance Considerations
### Rust Performance Features
1. **Zero-Cost Abstractions**: Traits and generics compile to efficient code
2. **Memory Safety**: No garbage collection overhead
3. **SIMD Optimizations**: Vectorized operations for large datasets
4. **Parallel Processing**: Rayon for data parallelism
### Optimization Strategies
#### Parallel Analysis
```rust
use rayon::prelude::*;
files.par_iter()
.map(|file| analyze_file(file))
.collect::<Result<Vec<_>, _>>()
```
#### Memory Efficiency
```rust
// Stream processing for large files
use std::io::{BufReader, BufRead};
let reader = BufReader::new(file);
for line in reader.lines() {
// Process line by line
}
```
#### Caching Strategy
- **Content-based hashing**: SHA-256 of file content + config
- **Hierarchical caching**: File, directory, and project level
- **TTL-based expiration**: Configurable cache lifetime
- **Selective invalidation**: Only invalidate affected cache entries
## Extension Points
### Adding New Languages
1. **Language Definition**:
```rust
pub struct NewLanguageAdapter {
parser: LanguageParser,
config: LanguageConfig,
}
impl LanguageAdapter for NewLanguageAdapter {
}
```
2. **Register Language**:
```rust
pub fn register_languages() -> HashMap<String, Box<dyn LanguageAdapter>> {
let mut languages = HashMap::new();
languages.insert("newlang".to_string(), Box::new(NewLanguageAdapter::new()));
languages
}
```
### Adding New Detectors
1. **Detector Implementation**:
```rust
pub struct CustomDetector {
config: CustomConfig,
}
impl Detector for CustomDetector {
async fn analyze(&self, entities: &[CodeEntity]) -> Result<DetectorResult> {
}
}
```
2. **Pipeline Integration**:
```rust
impl AnalysisPipeline {
async fn run_custom_analysis(&self, files: &[PathBuf]) -> Result<CustomAnalysisResults> {
let detector = CustomDetector::new(self.config.custom.clone());
detector.analyze(&entities).await
}
}
```
### Adding New Output Formats
1. **Format Implementation**:
```rust
pub struct CustomFormatter;
impl ReportFormatter for CustomFormatter {
fn format(&self, results: &ComprehensiveAnalysisResult) -> Result<String> {
}
}
```
2. **Registration**:
```rust
pub fn get_formatter(format: &OutputFormat) -> Box<dyn ReportFormatter> {
match format {
OutputFormat::Custom => Box::new(CustomFormatter),
}
}
```
## Design Decisions
### ADR-001: Rust Implementation Choice
**Status**: Accepted
**Context**: Need for high-performance code analysis tool capable of handling large codebases.
**Decision**: Implement in Rust for performance, memory safety, and ecosystem benefits.
**Consequences**:
- ✅ Excellent performance and memory efficiency
- ✅ Memory safety without garbage collection overhead
- ✅ Rich ecosystem of parsing and analysis crates
- ❌ Higher learning curve for contributors
- ❌ Longer compilation times during development
### ADR-002: Tree-sitter for Parsing
**Status**: Accepted
**Context**: Need robust, language-agnostic parsing for multiple programming languages.
**Decision**: Use Tree-sitter for all language parsing.
**Consequences**:
- ✅ Uniform AST structure across languages
- ✅ Error-tolerant parsing of incomplete code
- ✅ High performance with incremental parsing
- ✅ Large ecosystem of language grammars
- ❌ Additional dependency on Tree-sitter binaries
- ❌ Learning curve for Tree-sitter query syntax
### ADR-003: Multi-Stage Analysis Pipeline
**Status**: Accepted
**Context**: Need to coordinate multiple types of analysis with progress tracking.
**Decision**: Implement pipeline pattern with distinct analysis stages.
**Consequences**:
- ✅ Clear separation of concerns
- ✅ Easy to add new analysis types
- ✅ Progress tracking and error isolation
- ✅ Configurable analysis stages
- ❌ Potential for redundant file processing
- ❌ More complex coordination logic
### ADR-004: Configuration-First Design
**Status**: Accepted
**Context**: Need flexible configuration for different project types and team standards.
**Decision**: Make all analysis behavior configurable through YAML configuration.
**Consequences**:
- ✅ Flexible adaptation to different codebases
- ✅ Team-specific standard enforcement
- ✅ CLI and config file integration
- ✅ Validation and documentation
- ❌ Configuration complexity for users
- ❌ Default configuration maintenance
### ADR-005: Quality Gates for CI/CD
**Status**: Accepted
**Context**: Need automated quality control integration with development workflows.
**Decision**: Implement configurable quality gates with exit codes for CI/CD.
**Consequences**:
- ✅ Automated quality enforcement
- ✅ Configurable failure conditions
- ✅ Integration with existing CI/CD systems
- ✅ Gradual quality improvement enforcement
- ❌ Potential for overly strict gates blocking development
- ❌ Configuration complexity for optimal thresholds
## Future Considerations
### Planned Enhancements
1. **Incremental Analysis**: Only analyze changed files
2. **Language Server Protocol**: IDE integration for real-time analysis
3. **Machine Learning**: Improved pattern recognition and recommendations
4. **Distributed Analysis**: Support for analyzing very large codebases
5. **Custom Rules**: User-defined analysis rules and patterns
6. **Integration APIs**: REST API for third-party tool integration
### Scalability Plans
1. **Database Backend**: Store analysis results in database for large projects
2. **Caching Strategy**: Distributed caching for team environments
3. **Parallel Execution**: Multi-machine analysis coordination
4. **Memory Management**: Streaming analysis for memory-constrained environments
This architecture provides a solid foundation for comprehensive code analysis while maintaining flexibility for future enhancements and extensions.