# Code Search - Technical Specification
## Overview
Code Search is a fast, intelligent CLI tool for searching and analyzing codebases, built in Rust. It provides precise structural understanding that complements semantic search and RAG systems for AI agents.
**Version**: 0.1.4
**Language**: Rust (Edition 2024)
**License**: Apache-2.0
## Core Capabilities
### 1. Pattern Search Engine
**Supported Search Modes:**
- **Exact Match**: Direct string matching with line-level precision
- **Regex**: Full regex pattern support with compiled pattern caching
- **Fuzzy**: Typo-tolerant search using Levenshtein distance
**Features:**
- Parallel file processing with rayon
- Thread-safe result caching with DashMap
- Relevance scoring and ranking
- Context extraction (surrounding lines)
- Multi-extension filtering
### 2. Language Support
**48 Languages Supported:**
- Systems: Rust, C, C++, Go, Zig, V, Nim
- Web: JavaScript, TypeScript, HTML, CSS, SCSS
- Backend: Python, Java, Kotlin, C#, PHP, Ruby, Scala, Perl
- Functional: Haskell, Elixir, Erlang, Clojure, OCaml, F#
- Mobile: Swift, Dart, Objective-C
- Scripting: Shell, PowerShell, Lua, R, Julia
- Data: SQL, YAML, TOML, JSON, XML
- Infrastructure: Dockerfile, Terraform, Makefile
- Others: GraphQL, Protobuf, Solidity, WebAssembly, Assembly
**Language-Specific Patterns:**
- Function definitions (e.g., `fn`, `def`, `function`, `func`, `proc`)
- Class/struct definitions
- Import/use statements
- Comment patterns (single-line, multi-line, doc comments)
### 3. Enhanced Dead Code Detection
**Detection Types (6+ categories):**
#### 3.1 Unused Variables
- Detects variables declared but never referenced
- Patterns: `let`, `const`, `var`, `:=`, `<-`
- Excludes: Variables starting with `_`, single-letter vars, `err`
- **Output**: `[var]` marker with line number and reason
#### 3.2 Unreachable Code
- Identifies code after return statements
- Tracks brace depth and control flow
- Detects statements that will never execute
- **Output**: `[!]` marker with truncated code preview
#### 3.3 Empty Functions
- Finds functions with no implementation
- **Brace-based languages**: Detects `{}`
- **Indentation-based languages**: Detects Python `:` with only `pass`
- Excludes special functions (main, test_, constructors, trait implementations)
- **Output**: `[∅]` marker with function name
#### 3.4 TODO/FIXME Markers
- Flags incomplete or problematic code markers
- Markers: TODO, FIXME, HACK, XXX, BUG
- Only detects in comments (not in strings)
- **Output**: `[?]` marker with truncated comment
#### 3.5 Commented-Out Code
- Detects code that has been commented out
- Identifies function/variable declarations in comments
- Excludes documentation comments and standard notes
- **Output**: `[commented code]` with truncated line
#### 3.6 Unused Imports
- Tracks import/use statements
- Counts references across entire file
- Reports imports with ≤1 occurrence
- **Output**: `[imp]` marker with import name
**Special Function Exclusions:**
- Entry points: `main`, `init`, `__init__`
- Test functions: `test_*`, `Test*`
- Lifecycle: `setup`, `teardown`, `drop`, `finalize`
- Trait implementations: `clone`, `fmt`, `eq`, `hash`, `serialize`
- Event handlers: `on*`, `handle*`
- Private functions: `_*`
### 4. Code Complexity Analysis
**Metrics Calculated:**
- **Cyclomatic Complexity**: Number of linearly independent paths
- **Cognitive Complexity**: Measure of code understandability
- **Nesting Depth**: Maximum nesting level
- **Function Count**: Total functions per file
- **Line Count**: Total lines of code
**Thresholds:**
- Low: < 10
- Medium: 10-20
- High: > 20
### 5. Code Duplication Detection
**Algorithm:**
- Extracts code blocks (minimum configurable lines)
- Calculates string similarity using normalized edit distance
- Configurable similarity threshold (default: 0.9)
- Reports file pairs with similar blocks
### 6. MCP Server Integration
**Protocol**: Model Context Protocol (MCP)
**Transport**: stdio
**Exposed Tools (7 total):**
1. `search_code`: Pattern search with filters
2. `list_files`: Directory enumeration
3. `analyze_codebase`: Metrics and statistics
4. `detect_complexity`: Complexity analysis
5. `detect_duplicates`: Duplication detection
6. `detect_deadcode`: Dead code analysis
7. `detect_circular`: Circular dependency detection
## Architecture
### Module Organization
```
codesearch/
├── src/
│ ├── main.rs (699 LOC) # CLI entry point
│ ├── lib.rs # Library exports
│ ├── deadcode.rs (685 LOC) # Dead code detection ⭐
│ ├── search.rs (645 LOC) # Search engine
│ ├── language.rs (510 LOC) # Language definitions
│ ├── analysis.rs (418 LOC) # Codebase analysis
│ ├── mcp_server.rs (375 LOC) # MCP server
│ ├── complexity.rs (308 LOC) # Complexity metrics
│ ├── duplicates.rs (196 LOC) # Duplication detection
│ ├── circular.rs (196 LOC) # Circular dependencies
│ ├── export.rs (185 LOC) # Export functionality
│ ├── parser.rs (176 LOC) # Code parsing utilities
│ ├── cache.rs (125 LOC) # Result caching
│ ├── types.rs (112 LOC) # Data structures
│ └── interactive.rs (350 LOC) # REPL interface
└── tests/
└── integration_tests.rs (26 tests)
```
**Total**: ~4,500 lines of Rust code across 14 modules
### Data Structures
```rust
// Dead code detection result
pub struct DeadCodeItem {
pub file: String,
pub line_number: usize,
pub item_type: String, // "variable", "unreachable", "empty", etc.
pub name: String,
pub reason: String,
}
// Search result
pub struct SearchResult {
pub file: String,
pub matches: Vec<Match>,
pub line_count: usize,
pub relevance_score: f64,
}
// Complexity metrics
pub struct ComplexityMetrics {
pub cyclomatic: usize,
pub cognitive: usize,
pub nesting_depth: usize,
}
```
## Performance Characteristics
### Optimization Strategies
1. **Parallel Processing**
- Uses rayon for multi-threaded file processing
- Scales to available CPU cores
- Thread-safe operations throughout
2. **Caching**
- Query-based caching with file modification tracking
- Automatic cache invalidation on file changes
- Thread-safe DashMap implementation
3. **Memory Efficiency**
- Streaming file reading (no full file loads)
- Efficient data structures (DashMap, ahash)
- Lazy evaluation where possible
4. **Regex Optimization**
- Compiled patterns cached
- Reused across file processing
- Minimal regex compilation overhead
### Performance Characteristics
- Parallel processing scales with available CPU cores
- Thread-safe caching reduces redundant work
- Streaming file reading minimizes memory usage
*Note: Actual performance depends on codebase size, hardware, and query complexity.*
## Testing Strategy
### Test Coverage (95 total tests)
1. **Unit Tests (49 tests)**
- search.rs: 7 tests
- deadcode.rs: 11 tests ⭐
- complexity.rs: 6 tests
- analysis.rs: 4 tests
- duplicates.rs: 4 tests
- parser.rs: 5 tests
- language.rs: 3 tests
- cache.rs: 3 tests
- export.rs: 3 tests
- types.rs: 2 tests
- circular.rs: 2 tests
2. **Integration Tests (26 tests)**
- CLI command execution
- Output format validation
- Error handling
3. **MCP Tests (23 tests)**
- Tool invocation
- Parameter validation
- Response format
### Test Execution
```bash
# All tests
cargo test --features mcp
# Specific module
cargo test deadcode --lib
# With output
cargo test -- --nocapture
```
## CLI Interface
### Commands
```bash
# Search
codesearch <query> [path] [options]
codesearch interactive
# Analysis
codesearch analyze [path]
codesearch complexity [path] [--threshold N] [--sort]
codesearch deadcode [path] [-e extensions] [--exclude dirs]
codesearch duplicates [path] [--min-lines N] [--similarity N]
codesearch circular [path] [-e extensions]
# MCP Server
codesearch mcp-server
```
### Options
- `-e, --extensions`: Filter by file extensions (comma-separated)
- `-x, --exclude`: Exclude directories (comma-separated)
- `-f, --fuzzy`: Enable fuzzy matching
- `-r, --regex`: Enable regex mode
- `-i, --ignore-case`: Case-insensitive search
- `--export`: Export format (csv, markdown)
- `--threshold`: Complexity threshold
- `--sort`: Sort results
## Output Formats
### Dead Code Detection Output
```
🔍 Dead Code Detection
──────────────────────────────
Found 12 potential dead code items:
[src/example.rs]
[var] L 10: variable 'unused_var' - Variable declared but never used
[!] L 25: unreachable - Code after return statement is unreachable
[∅] L 42: empty_helper - Empty function with no implementation
[?] L 58: // TODO: implement this - TODO marker
[imp] L 72: import 'HashMap' - Imported but never used
📊 Summary:
• variable: 3
• unreachable: 2
• empty: 2
• todo: 3
• import: 2
```
## Dependencies
### Production Dependencies
- clap 4.4 - CLI parsing
- regex 1.10 - Pattern matching
- walkdir 2.4 - Directory traversal
- serde 1.0 - Serialization
- colored 2.1 - Terminal colors
- rayon 1.8 - Parallel processing
- dashmap 5.5 - Thread-safe maps
- fuzzy-matcher 0.3 - Fuzzy search
### Optional Dependencies (MCP)
- rmcp 0.10 - MCP protocol
- tokio 1.0 - Async runtime
- schemars 1.2 - JSON schema
### Development Dependencies
- tempfile 3.8 - Temporary files for tests
## Future Enhancements
### Planned Features
- AST-based code analysis (beyond regex)
- Incremental indexing for large codebases
- Git history search integration
- Remote repository support
- Plugin system for custom analyzers
- Web UI for visualization
### Performance Improvements
- File watching for real-time updates
- Optimized memory usage for very large files
- Incremental cache updates
## Version History
### 0.1.4 (Current)
- Enhanced dead code detection with 6+ detection types
- Unused variables detection
- Unreachable code detection
- Empty function detection (multi-language)
- TODO/FIXME marker detection
- 11 new unit tests for dead code detection
- Updated documentation
### 0.1.3
- Added MCP server support
- 48 language support
- Complexity metrics
- Code duplication detection
### 0.1.2
- Interactive mode
- Fuzzy search
- Export functionality
### 0.1.1
- Basic search functionality
- Regex support
- Multi-extension filtering
## License
Apache-2.0 License
---