# Architecture
## Overview
CodeSearch is a fast, intelligent CLI tool built in Rust for code search and analysis. The architecture prioritizes performance, maintainability, and extensibility through modular design, trait abstractions, and comprehensive error handling.
## System Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer │
├─────────────────────────────────────────────────────────────────┤
│ main.rs → cli.rs → commands/ (search, analysis, graph, util) │
│ │
│ Command routing, argument parsing, output formatting │
└────────────────┬────────────────────────────────────────────────┘
│
┌────────┴────────┐
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Search │ │ Analysis │
│ Engine │ │ Engine │
└──────┬───────┘ └────────┬─────────┘
│ │
▼ ▼
┌─────────────────────────────────────┐
│ Core Layer │
│ (traits, types, errors, fs) │
└─────────────────────────────────────┘
```
## Module Structure
### Core Modules
```
src/
├── lib.rs # Library exports and public API
├── main.rs # CLI entry point
├── cli.rs # CLI definitions (extracted from main.rs)
├── types.rs # Shared data structures (SearchOptions, etc.)
├── traits.rs # Core trait abstractions (SearchEngine, Analyzer, GraphBuilder)
├── errors.rs # Custom error types (SearchError, AnalysisError, etc.)
├── fs.rs # FileSystem trait (RealFileSystem, MockFileSystem)
│
├── commands/ # Command handlers (extracted from main.rs)
│ ├── mod.rs
│ ├── search.rs # Search command handler
│ ├── analysis.rs # Analysis command handler
│ ├── graph.rs # Graph command handler
│ └── util.rs # Utility command handlers
│
├── search/ # Search functionality
│ ├── mod.rs
│ ├── core.rs # Core search logic
│ ├── fuzzy.rs # Fuzzy matching
│ ├── semantic.rs # Semantic search
│ ├── utilities.rs # Search utilities
│ ├── engine.rs # DefaultSearchEngine implementation
│ └── pure.rs # Pure functions for testing
│
├── analysis.rs # Codebase analysis
├── complexity.rs # Complexity metrics
├── codemetrics/ # Comprehensive code metrics
│ ├── mod.rs
│ ├── complexity.rs
│ ├── size.rs
│ ├── maintainability.rs
│ └── helpers.rs
│
├── deadcode/ # Dead code detection
│ ├── mod.rs
│ ├── detectors.rs # Detection implementations
│ ├── helpers.rs # Helper functions
│ └── types.rs # Data structures
│
├── duplicates/ # Code duplication detection
│ ├── mod.rs
│ ├── detector.rs
│ ├── similarity.rs
│ ├── normalize.rs
│ └── types.rs
│
├── circular/ # Circular dependency detection
│ ├── mod.rs
│ ├── detector.rs
│ └── types.rs
│
├── designmetrics/ # Design metrics
│ ├── mod.rs
│ ├── types.rs
│ ├── analysis.rs
│ ├── extractors.rs
│ └── reporting.rs
│
├── graphs/ # Graph analysis (unified interface)
├── ast.rs # Abstract Syntax Tree
├── cfg.rs # Control Flow Graph
├── dfg.rs # Data Flow Graph
├── pdg.rs # Program Dependency Graph
├── callgraph.rs # Call Graph
├── depgraph.rs # Dependency Graph
├── unified.rs # Unified graph (AST+CFG+DFG)
│
├── language/ # Language support
│ ├── mod.rs
│ ├── types.rs
│ ├── definitions.rs # 48 language definitions
│ └── utilities.rs
│
├── parser/ # Code parsers
│ ├── mod.rs
│ ├── traits.rs
│ ├── tokenizer.rs
│ ├── rust.rs
│ ├── python.rs
│ ├── javascript.rs
│ ├── java.rs
│ └── go.rs
│
├── extract/ # Code extraction utilities
│ ├── mod.rs
│ └── regex_extractor.rs
│
├── find.rs # Symbol finding (definition, references, callers)
├── health.rs # Health scoring (deadcode + duplicates + complexity)
├── security.rs # Security pattern scanning
│
├── cache.rs # Simple cache
├── cache_lru.rs # LRU cache wrapper
├── index.rs # Code indexing
├── watcher.rs # File watching
│
├── githistory.rs # Git history search
├── remote.rs # Remote repository search
│
├── export.rs # Export functionality (CSV, Markdown, JSON)
├── interactive.rs # Interactive REPL mode
│
├── mcp/ # MCP server integration
│ ├── mod.rs
│ ├── tools.rs # Tool definitions
│ ├── schemas.rs # JSON schemas
│ └── params.rs # Parameter types
│
└── memopt.rs # Memory optimization utilities
```
### Key Design Patterns
#### 1. Trait Abstractions
```rust
// SearchEngine trait for pluggable search strategies
pub trait SearchEngine: Send + Sync {
fn search(
&self,
query: &str,
path: &Path,
options: &SearchOptions,
) -> Result<Vec<SearchResult>, Box<dyn std::error::Error>>;
}
// Analyzer trait for different analysis types
pub trait Analyzer: Send + Sync {
type Output;
fn analyze(
&self,
path: &Path,
extensions: Option<&[String]>,
) -> Result<Self::Output, Box<dyn std::error::Error>>;
}
// GraphBuilder trait for graph construction
pub trait GraphBuilder: Send + Sync {
type Graph;
fn build(
&self,
source: &str,
name: Option<&str>,
) -> Result<Self::Graph, Box<dyn std::error::Error>>;
}
```
#### 2. Parameter Object Pattern
```rust
// SearchOptions bundles related parameters
pub struct SearchOptions {
pub extensions: Option<Vec<String>>,
pub ignore_case: bool,
pub fuzzy: bool,
pub fuzzy_threshold: f64,
pub max_results: usize,
pub exclude: Option<Vec<String>>,
pub rank: bool,
pub cache: bool,
pub semantic: bool,
pub benchmark: bool,
pub vs_grep: bool,
}
// Builder pattern for easy configuration
impl SearchOptions {
pub fn with_extensions(mut self, extensions: Vec<String>) -> Self { ... }
pub fn with_fuzzy(mut self, fuzzy: bool) -> Self { ... }
// ... other with_* methods
}
```
#### 3. Dependency Injection
```rust
// FileSystem trait for testability
pub trait FileSystem: Send + Sync {
fn read_file(&self, path: &Path) -> Result<String>;
fn write_file(&self, path: &Path, content: &str) -> Result<()>;
fn list_files(&self, path: &Path, options: &WalkOptions) -> Result<Vec<PathBuf>>;
}
// Production implementation
pub struct RealFileSystem;
impl FileSystem for RealFileSystem { ... }
// Test implementation
pub struct MockFileSystem {
files: HashMap<PathBuf, String>,
}
impl FileSystem for MockFileSystem { ... }
```
## Data Flow
### Search Flow
```
User Query
│
▼
CLI (main.rs)
│
▼
Commands (commands/search.rs)
│
▼
SearchEngine (DefaultSearchEngine)
│
├─────────────────┐
│ │
▼ ▼
Cache (LruCache) FileSystem (RealFileSystem)
│ │
│ ▼
│ Parallel Processing (rayon)
│ │
│ ▼
│ Search in Files
│ │
│ ▼
│ Relevance Scoring
│ │
└─────────────────┤
▼
Results to User
```
### Analysis Flow
```
Command (analyze/deadcode/complexity)
│
▼
Analyzer implementation
│
▼
FileSystem → File List
│
▼
Parallel Processing (rayon)
│
▼
Analysis per file
│
▼
Aggregate Results
│
▼
Format & Export
```
## Core Components
### 1. CLI Layer (`main.rs`, `cli.rs`, `commands/`)
**Responsibilities:**
- Command-line argument parsing using `clap`
- Command routing to appropriate handlers
- Output formatting (text, JSON, CSV, Markdown)
- User interaction (interactive mode)
**Key Files:**
- `main.rs`: Entry point, reduced to ~400 LOC
- `cli.rs`: CLI definitions, extracted from main.rs
- `commands/mod.rs`: Command handler organization
- `commands/search.rs`: Search command handler
- `commands/analysis.rs`: Analysis command handler
- `commands/graph.rs`: Graph command handler
- `commands/util.rs`: Utility command handlers
### 2. Search Engine (`search/`)
**Responsibilities:**
- Pattern matching (regex, fuzzy, exact)
- Parallel file processing with `rayon`
- Relevance scoring and ranking
- Query enhancement and caching
**Key Modules:**
- `search/core.rs`: Core search logic
- `search/fuzzy.rs`: Fuzzy matching algorithms
- `search/semantic.rs`: Semantic search enhancement
- `search/engine.rs`: DefaultSearchEngine implementation
- `search/pure.rs`: Pure functions for testing
### 3. Analysis Engines
**Dead Code Detection (`deadcode/`):**
- Modular structure with 4 sub-modules
- Detects 6+ types of dead code
- Multi-language support
**Complexity Analysis (`complexity.rs`, `codemetrics/`):**
- Cyclomatic and cognitive complexity
- Comprehensive code metrics (Halstead, maintainability, etc.)
**Duplication Detection (`duplicates/`):**
- Similarity-based detection
- Configurable thresholds
**Circular Dependencies (`circular/`):**
- Detects circular function calls
- Analyzes module dependencies
### 4. Graph Analysis (`graphs/`, `ast.rs`, `cfg.rs`, etc.)
**Supported Graph Types:**
- AST (Abstract Syntax Tree)
- CFG (Control Flow Graph)
- DFG (Data Flow Graph)
- PDG (Program Dependency Graph)
- Call Graph
- Dependency Graph
- Unified Graph (AST + CFG + DFG)
### 5. MCP Integration (`mcp/`)
**Exposed Tools (9 tools):**
1. `search` - Pattern search
2. `list` - File enumeration
3. `analyze` - Codebase analysis
4. `complexity` - Complexity metrics
5. `duplicates` - Duplication detection
6. `deadcode` - Dead code detection
7. `circular` - Circular dependencies
8. `find_symbol` - Symbol finding (definition, references, callers)
9. `get_health` - Health scoring
## Error Handling
### Custom Error Types (`errors.rs`)
```rust
pub enum SearchError {
#[error("File not found: {0}")]
FileNotFound(PathBuf),
#[error("Invalid regex pattern: {0}")]
InvalidPattern(String),
#[error("Pattern compilation failed: {0}")]
PatternCompilationError(String),
}
pub enum AnalysisError {
#[error("No files found for analysis")]
NoFilesFound,
#[error("Analysis failed for {0}: {1}")]
AnalysisFailed(PathBuf, String),
}
pub enum GraphError {
#[error("Source parsing failed: {0}")]
ParseError(String),
#[error("Graph construction failed: {0}")]
ConstructionError(String),
}
```
## Performance Optimizations
### 1. Parallel Processing
- Uses `rayon` for parallel file processing
- Automatically scales to available CPU cores
- Thread-safe operations throughout
### 2. Caching
- LRU cache wrapper with automatic eviction
- Query-based caching with file modification tracking
- Thread-safe with DashMap
### 3. Memory Management
- Streaming file reading (doesn't load entire files)
- Efficient data structures (DashMap, ahash)
- Lazy evaluation where possible
### 4. Hot Path Optimizations
- Regex compilation outside loops
- String interning for repeated strings
- Buffer reuse in hot loops
## Testing Architecture
### Test Coverage (173 unit + 36 integration + 23 MCP tests)
**Unit Tests:**
- Co-located with implementation
- Test individual functions in isolation
- Use temporary directories for file operations
- Pure function testing
**Integration Tests:**
- End-to-end CLI command testing
- Output format validation
- Error handling verification
**Property-Based Tests:**
- `proptest` for fuzzing
- Test invariants
- Generate random inputs
**MCP Tests:**
- Tool invocation testing
- Parameter validation
- Response format verification
## Quality Standards
### Code Quality
- ✅ **100% test pass rate** (173 unit + 36 integration tests)
- ✅ **Zero clippy warnings** (clean code)
- ✅ **Modular architecture** (40+ focused modules)
- ✅ **Thread-safe** parallel processing with rayon
- ✅ **Comprehensive error handling**
### Maintainability
- Trait abstractions for extensibility
- Parameter object pattern to reduce parameter counts
- Dependency injection for testability
- Clear separation of concerns
### Performance
- **Fast**: 3-50ms for typical searches
- **Parallel**: Auto-scales to available CPU cores
- **Smart caching**: LRU cache with automatic eviction
- **Memory efficient**: Streaming file reading
## Dependencies
### Core Dependencies
- `clap` 4.4 - CLI parsing
- `regex` 1.10 - Pattern matching
- `rayon` 1.8 - Parallel processing
- `serde`/`serde_json` 1.0 - Serialization
- `thiserror` 1.0 - Custom error types
- `anyhow` 1.0 - Error propagation
### Performance Dependencies
- `dashmap` 5.5 - Thread-safe maps
- `ahash` 0.8 - Fast hashing
- `fuzzy-matcher` 0.3 - Fuzzy search
- `lru` 0.12 - LRU cache
### Optional Dependencies (MCP)
- `rmcp` 0.12 - MCP server
- `tokio` 1.0 - Async runtime
- `schemars` 1.2 - JSON schemas
## Build Configuration
- **Rust Edition**: 2024
- **Version**: 0.1.8
- **Default Features**: None (minimal dependencies)
- **Optional Features**: `mcp` (MCP server support)
- **Target**: Native binary (CLI-only)
## Extension Points
### Adding New Commands
1. Add variant to CLI enum
2. Implement handler function in `commands/`
3. Add CLI argument parsing
### Adding New Search Features
1. Extend `SearchEngine` trait
2. Implement new search strategy
3. Register with command handler
### Adding New Analyzers
1. Extend `Analyzer` trait
2. Implement analysis logic
3. Add command handler
### Adding New Graph Types
1. Extend `GraphBuilder` trait
2. Implement graph construction
3. Add to `graphs.rs` unified interface
---
**Built with Rust** • Fast • Precise • 48+ Languages