# Code Search
A fast, intelligent CLI tool for searching and analyzing codebases, built in Rust. Designed as a **code-aware supplement to AI agents and LLMs**, providing precise structural understanding that semantic search cannot deliver.
[](https://www.rust-lang.org)
[](https://opensource.org/licenses/Apache-2.0)
## π― Why Code Search Matters for AI Agents
**The Problem**: LLMs and RAG systems treat code as text, losing critical structural information which can be provided by search, static analysis of the code.
**The Solution**: Code Search provides **structured, precise code intelligence** that agents can trust:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI AGENT / LLM β
β "I need to understand the authentication module" β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββ΄ββββββββββββββββββ
β β
ββββββΌβββββ βββββββΌββββββ
β RAG β β CodeSearchβ
β Semanticβ β Structuralβ
ββββββ¬βββββ βββββββ¬ββββββ
β β
"Files about "auth.rs: L15-45
authentication" fn authenticate()
(fuzzy, chunked) fn verify_token()
3 callers, 2 deps"
(precise, complete)
```
## π§ Key Capabilities
### 1. **Precise Pattern Matching** (Not Semantic Guessing)
| Find `fn authenticate` | Returns similar functions | Returns exact function + line number |
| Find all TODO comments | Misses non-standard formats | Regex: `TODO\|FIXME\|HACK` catches all |
| Find unused imports | Cannot detect | Analyzes actual usage |
| Rename `oldFunc` β `newFunc` | Suggests similar names | Finds every exact occurrence |
### 2. **Language-Aware Intelligence** (48 Languages)
```bash
# Each language has tailored patterns
codesearch "fn\\s+\\w+" -e rs # Rust functions
codesearch "def\\s+\\w+" -e py # Python functions
codesearch "async\\s+function" -e js # JS async functions
```
**Understands:**
- Function definitions, class structures, imports
- Comment patterns (single-line, multi-line, doc comments)
- Language-specific syntax (traits, interfaces, decorators)
### 3. **Code Quality Analysis**
| **Complexity** | Cyclomatic & cognitive complexity scores |
| **Dead Code** | Unused imports, functions, classes |
| **Duplicates** | Similar code blocks (DRY violations) |
### 4. **MCP Server for Agent Integration**
Exposes code intelligence as tools that AI agents can call:
```bash
cargo run --features mcp -- mcp-server
```
**Available Tools:**
- `search_code` - Find patterns with fuzzy/regex support
- `list_files` - Enumerate codebase with filters
- `analyze_codebase` - Get metrics and statistics
## π How It Complements RAG & LLMs
| **Find function** | "Similar to auth..." | Exact: `auth.rs:L42` |
| **Count usages** | "Mentioned several times" | Precise: "Called 7 times in 3 files" |
| **Find all usages** | Suggests changes | Validates all occurrences found |
| **Dead code** | Cannot detect | Lists unused with line numbers |
| **Complexity** | No metrics | Cyclomatic score: 15 |
### The Hybrid Approach
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User: "Help me understand and improve the auth module" β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β RAG β β CodeSearchβ β LLM β
β Semantic β β Structuralβ β Reasoningβ
ββββββ¬ββββββ βββββββ¬βββββ ββββββ¬ββββββ
β β β
"Auth handles "auth.rs: "Based on the
user login, fn login() L12 structure, I
sessions..." fn verify() L45 recommend..."
complexity: 18
dead code: 2
```
## π Quick Start
```bash
# Simple search: codesearch <query> [path]
codesearch "function" # Search current directory
codesearch "TODO" ./src # Search specific path
codesearch "class" ./src -e py # Filter by extension
# Fuzzy search (handles typos)
codesearch "usrmngr" . --fuzzy
# Interactive mode
codesearch interactive
# Analysis commands
codesearch analyze # Codebase metrics
codesearch complexity # Complexity scores
codesearch duplicates # Find similar code
codesearch deadcode # Find unused code
```
## β¨ Features
- **Fast regex search** with exact line-level precision
- **Fuzzy matching** for typo tolerance
- **48 language support** with syntax awareness
- **Interactive REPL** for exploratory analysis
- **Code metrics** - complexity, duplication, dead code
- **Export** results to CSV or Markdown
- **MCP server** for AI agent integration
- **Parallel processing** for large codebases
## ποΈ Installation
```bash
git clone https://github.com/yingkitw/codesearch.git
cd codesearch
cargo build --release
# With MCP server support for AI agents
cargo build --release --features mcp
```
## π Usage Examples
### Search Patterns
```bash
# codesearch <query> [path] [options]
codesearch "TODO" # Search current directory
codesearch "class" ./src # Search specific folder
codesearch "error" . -e py,js,ts # Filter by extensions
# Regex patterns
codesearch "fn\\s+\\w+" ./src -e rs # Rust functions
codesearch "import.*from" . -e ts # TypeScript imports
# Fuzzy search (handles typos)
codesearch "authetication" . --fuzzy # Finds "authentication"
```
### Code Analysis
```bash
# Codebase overview
codesearch analyze
# Output: Files, lines, languages, function count, class count
# Complexity analysis
codesearch complexity --threshold 15 --sort
# Output: Files ranked by cyclomatic/cognitive complexity
# Dead code detection
codesearch deadcode -e rs,py,js
# Output: Unused imports, functions, classes
# Duplicate detection
codesearch duplicates --similarity 0.8
# Output: Similar code blocks that violate DRY
```
### Interactive Mode
```bash
codesearch interactive
```
**Commands:**
- Type any pattern to search
- `/f` - Toggle fuzzy mode
- `/i` - Toggle case insensitivity
- `analyze` - Codebase metrics
- `complexity` - Complexity analysis
- `deadcode` - Dead code detection
- `duplicates` - Find duplicates
- `help` - All commands
### MCP Server (AI Integration)
```bash
# Start MCP server
cargo run --features mcp -- mcp-server
# Agents can call:
# - search_code(query, path, extensions, fuzzy, regex)
# - list_files(path, extensions, exclude)
# - analyze_codebase(path, extensions)
```
## π Output Examples
### Search Results
```
π Search Results for "fn main"
ββββββββββββββββββββββββββββββ
π ./src/main.rs (1 match)
358: fn main() -> Result<(), Box<dyn std::error::Error>> {
π Statistics:
Files searched: 12
Matches found: 1
Time: 0.003s
```
### Dead Code Detection
```
π Dead Code Detection
ββββββββββββββββββββββββββββββ
β οΈ Found 5 potential dead code items:
π examples/deadcode_demo.rs
π₯ L 4: import 'HashMap' - Imported but never used
π₯ L 6: import 'Write' - Imported but never used
π Summary:
β’ import: 5
```
### Complexity Analysis
```
π Code Complexity Analysis
ββββββββββββββββββββββββββββββ
π Files by Complexity (highest first):
src/search.rs
Cyclomatic: 45 Cognitive: 38 Lines: 645
src/analysis.rs
Cyclomatic: 28 Cognitive: 22 Lines: 378
```
## π§ Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLI Layer β
β main.rs (358 LOC) - Argument parsing, command routing β
β interactive.rs (350 LOC) - REPL interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Core Engine β
β search.rs (645 LOC) - Pattern matching, fuzzy, ranking β
β language.rs (509 LOC) - 48 language definitions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Analysis Layer β
β analysis.rs (378 LOC) - Codebase metrics β
β complexity.rs (308 LOC) - Cyclomatic/cognitive complexity β
β deadcode.rs (373 LOC) - Unused code detection β
β duplicates.rs (196 LOC) - Similar code detection β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Integration Layer β
β mcp_server.rs (295 LOC) - MCP protocol for AI agents β
β export.rs (185 LOC) - CSV/Markdown output β
β cache.rs (127 LOC) - Result caching β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
**11 modules, ~3,800 lines of Rust code**
## π§ͺ Testing
```bash
# Run all tests (84 total)
cargo test --features mcp
# Unit tests: 35 (core functionality)
# Integration tests: 26 (CLI commands)
# MCP tests: 23 (server tools)
```
## β‘ Performance
- **10x faster** than grep for complex patterns
- **Parallel processing** with rayon
- **Memory efficient** streaming for large files
- **Compiled regex** patterns cached
- **Smart defaults** exclude build directories
## π Documentation
- [README.md](README.md) - This guide
- [ARCHITECTURE.md](ARCHITECTURE.md) - System design
- [TODO.md](TODO.md) - Roadmap
- [examples/](examples/) - Code samples with dead code demos
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Test: `cargo test --features mcp`
4. Submit a pull request
## π License
Apache-2.0 License
---
*"RAG tells you about code. Code Search shows you the code."*