Code Search

A fast, intelligent CLI tool for searching and analyzing codebases, built in Rust. Designed as a code-aware supplement to AI agents and LLMs, providing precise structural understanding that semantic search cannot deliver.

🎯 Why Code Search Matters for AI Agents

The Problem: LLMs and RAG systems treat code as text, losing critical structural information which can be provided by search, static analysis of the code.

The Solution: Code Search provides structured, precise code intelligence that agents can trust:

┌─────────────────────────────────────────────────────────────────┐
│                     AI AGENT / LLM                              │
│  "I need to understand the authentication module"               │
└──────────────────────────┬──────────────────────────────────────┘
                           │
         ┌─────────────────┴─────────────────┐
         │                                   │
    ┌────▼────┐                        ┌─────▼─────┐
    │   RAG   │                        │ CodeSearch│
    │ Semantic│                        │ Structural│
    └────┬────┘                        └─────┬─────┘
         │                                   │
    "Files about                      "auth.rs: L15-45
     authentication"                   fn authenticate()
     (fuzzy, chunked)                  fn verify_token()
                                       3 callers, 2 deps"
                                       (precise, complete)

🧠 Key Capabilities

1. Precise Pattern Matching (Not Semantic Guessing)

What You Need	RAG/Embeddings	Code Search
Find `fn authenticate`	Returns similar functions	Returns exact function + line number
Find all TODO comments	Misses non-standard formats	Regex: `TODO\|FIXME\|HACK` catches all
Find unused imports	Cannot detect	Analyzes actual usage
Rename `oldFunc` → `newFunc`	Suggests similar names	Finds every exact occurrence

2. Language-Aware Intelligence (48 Languages)

# Each language has tailored patterns
codesearch "fn\\s+\\w+" -e rs      # Rust functions
codesearch "def\\s+\\w+" -e py      # Python functions  
codesearch "async\\s+function" -e js # JS async functions

Understands:

Function definitions, class structures, imports
Comment patterns (single-line, multi-line, doc comments)
Language-specific syntax (traits, interfaces, decorators)

3. Code Quality Analysis

Analysis	What It Finds
Complexity	Cyclomatic & cognitive complexity scores
Dead Code	Unused imports, functions, classes
Duplicates	Similar code blocks (DRY violations)

4. MCP Server for Agent Integration

Exposes code intelligence as tools that AI agents can call:

cargo run --features mcp -- mcp-server

Available Tools:

search_code - Find patterns with fuzzy/regex support
list_files - Enumerate codebase with filters
analyze_codebase - Get metrics and statistics

🔄 How It Complements RAG & LLMs

Aspect	RAG Alone	+ Code Search
Find function	"Similar to auth..."	Exact: `auth.rs:L42`
Count usages	"Mentioned several times"	Precise: "Called 7 times in 3 files"
Find all usages	Suggests changes	Validates all occurrences found
Dead code	Cannot detect	Lists unused with line numbers
Complexity	No metrics	Cyclomatic score: 15

The Hybrid Approach

┌────────────────────────────────────────────────────────────────────┐
│  User: "Help me understand and improve the auth module"           │
└────────────────────────────────────────────────────────────────────┘
                                │
                ┌───────────────┼───────────────┐
                ▼               ▼               ▼
         ┌──────────┐    ┌──────────┐    ┌──────────┐
         │   RAG    │    │ CodeSearch│   │  LLM     │
         │ Semantic │    │ Structural│   │ Reasoning│
         └────┬─────┘    └─────┬────┘    └────┬─────┘
              │                │              │
      "Auth handles       "auth.rs:         "Based on the
       user login,         fn login() L12    structure, I
       sessions..."        fn verify() L45   recommend..."
                           complexity: 18
                           dead code: 2

🚀 Quick Start

# Simple search: codesearch <query> [path]
codesearch "function"           # Search current directory
codesearch "TODO" ./src         # Search specific path
codesearch "class" ./src -e py  # Filter by extension

# Fuzzy search (handles typos)
codesearch "usrmngr" . --fuzzy

# Interactive mode
codesearch interactive

# Analysis commands
codesearch analyze              # Codebase metrics
codesearch complexity           # Complexity scores
codesearch duplicates           # Find similar code
codesearch deadcode             # Find unused code

✨ Features

Fast regex search with exact line-level precision
Fuzzy matching for typo tolerance
48 language support with syntax awareness
Interactive REPL for exploratory analysis
Code metrics - complexity, duplication, dead code
Export results to CSV or Markdown
MCP server for AI agent integration
Parallel processing for large codebases

🏗️ Installation

git clone https://github.com/yingkitw/codesearch.git
cd codesearch
cargo build --release

# With MCP server support for AI agents
cargo build --release --features mcp

📖 Usage Examples

Search Patterns

# codesearch <query> [path] [options]
codesearch "TODO"                       # Search current directory
codesearch "class" ./src                # Search specific folder
codesearch "error" . -e py,js,ts        # Filter by extensions

# Regex patterns
codesearch "fn\\s+\\w+" ./src -e rs     # Rust functions
codesearch "import.*from" . -e ts       # TypeScript imports

# Fuzzy search (handles typos)
codesearch "authetication" . --fuzzy    # Finds "authentication"

Code Analysis

# Codebase overview
codesearch analyze
# Output: Files, lines, languages, function count, class count

# Complexity analysis
codesearch complexity --threshold 15 --sort
# Output: Files ranked by cyclomatic/cognitive complexity

# Dead code detection
codesearch deadcode -e rs,py,js
# Output: Unused imports, functions, classes

# Duplicate detection
codesearch duplicates --similarity 0.8
# Output: Similar code blocks that violate DRY

Interactive Mode

codesearch interactive

Commands:

Type any pattern to search
/f - Toggle fuzzy mode
/i - Toggle case insensitivity
analyze - Codebase metrics
complexity - Complexity analysis
deadcode - Dead code detection
duplicates - Find duplicates
help - All commands

MCP Server (AI Integration)

# Start MCP server
cargo run --features mcp -- mcp-server

# Agents can call:
# - search_code(query, path, extensions, fuzzy, regex)
# - list_files(path, extensions, exclude)
# - analyze_codebase(path, extensions)

📊 Output Examples

Search Results

🔍 Search Results for "fn main"
──────────────────────────────

📁 ./src/main.rs (1 match)
  358: fn main() -> Result<(), Box<dyn std::error::Error>> {

📊 Statistics:
  Files searched: 12
  Matches found: 1
  Time: 0.003s

Dead Code Detection

🔍 Dead Code Detection
──────────────────────────────

⚠️  Found 5 potential dead code items:

📄 examples/deadcode_demo.rs
   📥 L   4: import 'HashMap' - Imported but never used
   📥 L   6: import 'Write' - Imported but never used

📊 Summary:
   • import: 5

Complexity Analysis

📊 Code Complexity Analysis
──────────────────────────────

📁 Files by Complexity (highest first):

  src/search.rs
    Cyclomatic: 45  Cognitive: 38  Lines: 645

  src/analysis.rs
    Cyclomatic: 28  Cognitive: 22  Lines: 378

🔧 Architecture

┌─────────────────────────────────────────────────────────────┐
│                         CLI Layer                            │
│  main.rs (358 LOC) - Argument parsing, command routing       │
│  interactive.rs (350 LOC) - REPL interface                   │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                      Core Engine                             │
│  search.rs (645 LOC) - Pattern matching, fuzzy, ranking      │
│  language.rs (509 LOC) - 48 language definitions             │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                     Analysis Layer                           │
│  analysis.rs (378 LOC) - Codebase metrics                   │
│  complexity.rs (308 LOC) - Cyclomatic/cognitive complexity   │
│  deadcode.rs (373 LOC) - Unused code detection              │
│  duplicates.rs (196 LOC) - Similar code detection           │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────────────────────────────────────┐
│                    Integration Layer                         │
│  mcp_server.rs (295 LOC) - MCP protocol for AI agents       │
│  export.rs (185 LOC) - CSV/Markdown output                  │
│  cache.rs (127 LOC) - Result caching                        │
└─────────────────────────────────────────────────────────────┘

11 modules, ~3,800 lines of Rust code

🧪 Testing

# Run all tests (84 total)
cargo test --features mcp

# Unit tests: 35 (core functionality)
# Integration tests: 26 (CLI commands)  
# MCP tests: 23 (server tools)

⚡ Performance

10x faster than grep for complex patterns
Parallel processing with rayon
Memory efficient streaming for large files
Compiled regex patterns cached
Smart defaults exclude build directories

📚 Documentation

README.md - This guide
ARCHITECTURE.md - System design
TODO.md - Roadmap
examples/ - Code samples with dead code demos

🤝 Contributing

Fork the repository
Create a feature branch
Test: cargo test --features mcp
Submit a pull request

📄 License

Apache-2.0 License

Built with ❤️ in Rust | Precise | Fast | Agent-Ready

"RAG tells you about code. Code Search shows you the code."

codesearch 0.1.2