codesearch 0.1.6

A fast, intelligent CLI tool with multiple search modes (regex, fuzzy, semantic), code analysis, and dead code detection for 48+ languages
Documentation
# Code Search - Technical Specification

## Overview

Code Search is a fast, intelligent CLI tool for searching and analyzing codebases, built in Rust. It provides precise structural understanding that complements semantic search and RAG systems for AI agents.

**Version**: 0.1.4  
**Language**: Rust (Edition 2024)  
**License**: Apache-2.0

## Core Capabilities

### 1. Pattern Search Engine

**Supported Search Modes:**
- **Exact Match**: Direct string matching with line-level precision
- **Regex**: Full regex pattern support with compiled pattern caching
- **Fuzzy**: Typo-tolerant search using Levenshtein distance

**Features:**
- Parallel file processing with rayon
- Thread-safe result caching with DashMap
- Relevance scoring and ranking
- Context extraction (surrounding lines)
- Multi-extension filtering

### 2. Language Support

**48 Languages Supported:**
- Systems: Rust, C, C++, Go, Zig, V, Nim
- Web: JavaScript, TypeScript, HTML, CSS, SCSS
- Backend: Python, Java, Kotlin, C#, PHP, Ruby, Scala, Perl
- Functional: Haskell, Elixir, Erlang, Clojure, OCaml, F#
- Mobile: Swift, Dart, Objective-C
- Scripting: Shell, PowerShell, Lua, R, Julia
- Data: SQL, YAML, TOML, JSON, XML
- Infrastructure: Dockerfile, Terraform, Makefile
- Others: GraphQL, Protobuf, Solidity, WebAssembly, Assembly

**Language-Specific Patterns:**
- Function definitions (e.g., `fn`, `def`, `function`, `func`, `proc`)
- Class/struct definitions
- Import/use statements
- Comment patterns (single-line, multi-line, doc comments)

### 3. Enhanced Dead Code Detection

**Detection Types (6+ categories):**

#### 3.1 Unused Variables
- Detects variables declared but never referenced
- Patterns: `let`, `const`, `var`, `:=`, `<-`
- Excludes: Variables starting with `_`, single-letter vars, `err`
- **Output**: `[var]` marker with line number and reason

#### 3.2 Unreachable Code
- Identifies code after return statements
- Tracks brace depth and control flow
- Detects statements that will never execute
- **Output**: `[!]` marker with truncated code preview

#### 3.3 Empty Functions
- Finds functions with no implementation
- **Brace-based languages**: Detects `{}`
- **Indentation-based languages**: Detects Python `:` with only `pass`
- Excludes special functions (main, test_, constructors, trait implementations)
- **Output**: `[∅]` marker with function name

#### 3.4 TODO/FIXME Markers
- Flags incomplete or problematic code markers
- Markers: TODO, FIXME, HACK, XXX, BUG
- Only detects in comments (not in strings)
- **Output**: `[?]` marker with truncated comment

#### 3.5 Commented-Out Code
- Detects code that has been commented out
- Identifies function/variable declarations in comments
- Excludes documentation comments and standard notes
- **Output**: `[commented code]` with truncated line

#### 3.6 Unused Imports
- Tracks import/use statements
- Counts references across entire file
- Reports imports with ≤1 occurrence
- **Output**: `[imp]` marker with import name

**Special Function Exclusions:**
- Entry points: `main`, `init`, `__init__`
- Test functions: `test_*`, `Test*`
- Lifecycle: `setup`, `teardown`, `drop`, `finalize`
- Trait implementations: `clone`, `fmt`, `eq`, `hash`, `serialize`
- Event handlers: `on*`, `handle*`
- Private functions: `_*`

### 4. Code Complexity Analysis

**Metrics Calculated:**
- **Cyclomatic Complexity**: Number of linearly independent paths
- **Cognitive Complexity**: Measure of code understandability
- **Nesting Depth**: Maximum nesting level
- **Function Count**: Total functions per file
- **Line Count**: Total lines of code

**Thresholds:**
- Low: < 10
- Medium: 10-20
- High: > 20

### 5. Code Duplication Detection

**Algorithm:**
- Extracts code blocks (minimum configurable lines)
- Calculates string similarity using normalized edit distance
- Configurable similarity threshold (default: 0.9)
- Reports file pairs with similar blocks

### 6. MCP Server Integration

**Protocol**: Model Context Protocol (MCP)  
**Transport**: stdio

**Exposed Tools (7 total):**
1. `search_code`: Pattern search with filters
2. `list_files`: Directory enumeration
3. `analyze_codebase`: Metrics and statistics
4. `detect_complexity`: Complexity analysis
5. `detect_duplicates`: Duplication detection
6. `detect_deadcode`: Dead code analysis
7. `detect_circular`: Circular dependency detection

## Architecture

### Module Organization

```
codesearch/
├── src/
│   ├── main.rs (699 LOC)           # CLI entry point
│   ├── lib.rs                       # Library exports
│   ├── deadcode.rs (685 LOC)       # Dead code detection ⭐
│   ├── search.rs (645 LOC)         # Search engine
│   ├── language.rs (510 LOC)       # Language definitions
│   ├── analysis.rs (418 LOC)       # Codebase analysis
│   ├── mcp_server.rs (375 LOC)     # MCP server
│   ├── complexity.rs (308 LOC)     # Complexity metrics
│   ├── duplicates.rs (196 LOC)     # Duplication detection
│   ├── circular.rs (196 LOC)       # Circular dependencies
│   ├── export.rs (185 LOC)         # Export functionality
│   ├── parser.rs (176 LOC)         # Code parsing utilities
│   ├── cache.rs (125 LOC)          # Result caching
│   ├── types.rs (112 LOC)          # Data structures
│   └── interactive.rs (350 LOC)    # REPL interface
└── tests/
    └── integration_tests.rs (26 tests)
```

**Total**: ~4,500 lines of Rust code across 14 modules

### Data Structures

```rust
// Dead code detection result
pub struct DeadCodeItem {
    pub file: String,
    pub line_number: usize,
    pub item_type: String,      // "variable", "unreachable", "empty", etc.
    pub name: String,
    pub reason: String,
}

// Search result
pub struct SearchResult {
    pub file: String,
    pub matches: Vec<Match>,
    pub line_count: usize,
    pub relevance_score: f64,
}

// Complexity metrics
pub struct ComplexityMetrics {
    pub cyclomatic: usize,
    pub cognitive: usize,
    pub nesting_depth: usize,
}
```

## Performance Characteristics

### Optimization Strategies

1. **Parallel Processing**
   - Uses rayon for multi-threaded file processing
   - Scales to available CPU cores
   - Thread-safe operations throughout

2. **Caching**
   - Query-based caching with file modification tracking
   - Automatic cache invalidation on file changes
   - Thread-safe DashMap implementation

3. **Memory Efficiency**
   - Streaming file reading (no full file loads)
   - Efficient data structures (DashMap, ahash)
   - Lazy evaluation where possible

4. **Regex Optimization**
   - Compiled patterns cached
   - Reused across file processing
   - Minimal regex compilation overhead

### Performance Characteristics

- Parallel processing scales with available CPU cores
- Thread-safe caching reduces redundant work
- Streaming file reading minimizes memory usage

*Note: Actual performance depends on codebase size, hardware, and query complexity.*

## Testing Strategy

### Test Coverage (95 total tests)

1. **Unit Tests (49 tests)**
   - search.rs: 7 tests
   - deadcode.rs: 11 tests ⭐
   - complexity.rs: 6 tests
   - analysis.rs: 4 tests
   - duplicates.rs: 4 tests
   - parser.rs: 5 tests
   - language.rs: 3 tests
   - cache.rs: 3 tests
   - export.rs: 3 tests
   - types.rs: 2 tests
   - circular.rs: 2 tests

2. **Integration Tests (26 tests)**
   - CLI command execution
   - Output format validation
   - Error handling

3. **MCP Tests (23 tests)**
   - Tool invocation
   - Parameter validation
   - Response format

### Test Execution

```bash
# All tests
cargo test --features mcp

# Specific module
cargo test deadcode --lib

# With output
cargo test -- --nocapture
```

## CLI Interface

### Commands

```bash
# Search
codesearch <query> [path] [options]
codesearch interactive

# Analysis
codesearch analyze [path]
codesearch complexity [path] [--threshold N] [--sort]
codesearch deadcode [path] [-e extensions] [--exclude dirs]
codesearch duplicates [path] [--min-lines N] [--similarity N]
codesearch circular [path] [-e extensions]

# MCP Server
codesearch mcp-server
```

### Options

- `-e, --extensions`: Filter by file extensions (comma-separated)
- `-x, --exclude`: Exclude directories (comma-separated)
- `-f, --fuzzy`: Enable fuzzy matching
- `-r, --regex`: Enable regex mode
- `-i, --ignore-case`: Case-insensitive search
- `--export`: Export format (csv, markdown)
- `--threshold`: Complexity threshold
- `--sort`: Sort results

## Output Formats

### Dead Code Detection Output

```
🔍 Dead Code Detection
──────────────────────────────

Found 12 potential dead code items:

[src/example.rs]
   [var] L  10: variable 'unused_var' - Variable declared but never used
   [!]   L  25: unreachable - Code after return statement is unreachable
   [∅]   L  42: empty_helper - Empty function with no implementation
   [?]   L  58: // TODO: implement this - TODO marker
   [imp] L  72: import 'HashMap' - Imported but never used

📊 Summary:
   • variable: 3
   • unreachable: 2
   • empty: 2
   • todo: 3
   • import: 2
```

## Dependencies

### Production Dependencies
- clap 4.4 - CLI parsing
- regex 1.10 - Pattern matching
- walkdir 2.4 - Directory traversal
- serde 1.0 - Serialization
- colored 2.1 - Terminal colors
- rayon 1.8 - Parallel processing
- dashmap 5.5 - Thread-safe maps
- fuzzy-matcher 0.3 - Fuzzy search

### Optional Dependencies (MCP)
- rmcp 0.10 - MCP protocol
- tokio 1.0 - Async runtime
- schemars 1.2 - JSON schema

### Development Dependencies
- tempfile 3.8 - Temporary files for tests

## Future Enhancements

### Planned Features
- AST-based code analysis (beyond regex)
- Incremental indexing for large codebases
- Git history search integration
- Remote repository support
- Plugin system for custom analyzers
- Web UI for visualization

### Performance Improvements
- File watching for real-time updates
- Optimized memory usage for very large files
- Incremental cache updates

## Version History

### 0.1.4 (Current)
- Enhanced dead code detection with 6+ detection types
- Unused variables detection
- Unreachable code detection
- Empty function detection (multi-language)
- TODO/FIXME marker detection
- 11 new unit tests for dead code detection
- Updated documentation

### 0.1.3
- Added MCP server support
- 48 language support
- Complexity metrics
- Code duplication detection

### 0.1.2
- Interactive mode
- Fuzzy search
- Export functionality

### 0.1.1
- Basic search functionality
- Regex support
- Multi-extension filtering

## License

Apache-2.0 License

---

**Built with ❤️ in Rust** | **Precise** | **Fast** | **Agent-Ready**