codesearch 0.1.1

A fast CLI tool for searching codebases
Documentation
# Code Search

A fast, intelligent CLI tool for searching and analyzing codebases, built in Rust. Designed as a **code-aware supplement to AI agents and LLMs**, providing precise structural understanding that semantic search cannot deliver.

[![Rust](https://img.shields.io/badge/rust-1.70+-orange.svg)](https://www.rust-lang.org)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

## 🎯 Why Code Search Matters for AI Agents

**The Problem**: LLMs and RAG systems treat code as text, losing critical structural information which can be provided by search, static analysis of the code.

**The Solution**: Code Search provides **structured, precise code intelligence** that agents can trust:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     AI AGENT / LLM                              β”‚
β”‚  "I need to understand the authentication module"               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                                   β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”                        β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
    β”‚   RAG   β”‚                        β”‚ CodeSearchβ”‚
    β”‚ Semanticβ”‚                        β”‚ Structuralβ”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                        β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚                                   β”‚
    "Files about                      "auth.rs: L15-45
     authentication"                   fn authenticate()
     (fuzzy, chunked)                  fn verify_token()
                                       3 callers, 2 deps"
                                       (precise, complete)
```

## 🧠 Key Capabilities

### 1. **Precise Pattern Matching** (Not Semantic Guessing)

| What You Need | RAG/Embeddings | Code Search |
|---------------|----------------|-------------|
| Find `fn authenticate` | Returns similar functions | Returns exact function + line number |
| Find all TODO comments | Misses non-standard formats | Regex: `TODO\|FIXME\|HACK` catches all |
| Find unused imports | Cannot detect | Analyzes actual usage |
| Rename `oldFunc` β†’ `newFunc` | Suggests similar names | Finds every exact occurrence |

### 2. **Language-Aware Intelligence** (48 Languages)

```bash
# Each language has tailored patterns
codesearch "fn\\s+\\w+" -e rs      # Rust functions
codesearch "def\\s+\\w+" -e py      # Python functions  
codesearch "async\\s+function" -e js # JS async functions
```

**Understands:**
- Function definitions, class structures, imports
- Comment patterns (single-line, multi-line, doc comments)
- Language-specific syntax (traits, interfaces, decorators)

### 3. **Code Quality Analysis**

| Analysis | What It Finds |
|----------|---------------|
| **Complexity** | Cyclomatic & cognitive complexity scores |
| **Dead Code** | Unused imports, functions, classes |
| **Duplicates** | Similar code blocks (DRY violations) |

### 4. **MCP Server for Agent Integration**

Exposes code intelligence as tools that AI agents can call:

```bash
cargo run --features mcp -- mcp-server
```

**Available Tools:**
- `search_code` - Find patterns with fuzzy/regex support
- `list_files` - Enumerate codebase with filters
- `analyze_codebase` - Get metrics and statistics

## πŸ”„ How It Complements RAG & LLMs

| Aspect | RAG Alone | + Code Search |
|--------|-----------|---------------|
| **Find function** | "Similar to auth..." | Exact: `auth.rs:L42` |
| **Count usages** | "Mentioned several times" | Precise: "Called 7 times in 3 files" |
| **Find all usages** | Suggests changes | Validates all occurrences found |
| **Dead code** | Cannot detect | Lists unused with line numbers |
| **Complexity** | No metrics | Cyclomatic score: 15 |

### The Hybrid Approach

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User: "Help me understand and improve the auth module"           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β–Ό               β–Ό               β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   RAG    β”‚    β”‚ CodeSearchβ”‚   β”‚  LLM     β”‚
         β”‚ Semantic β”‚    β”‚ Structuralβ”‚   β”‚ Reasoningβ”‚
         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
              β”‚                β”‚              β”‚
      "Auth handles       "auth.rs:         "Based on the
       user login,         fn login() L12    structure, I
       sessions..."        fn verify() L45   recommend..."
                           complexity: 18
                           dead code: 2
```

## πŸš€ Quick Start

```bash
# Simple search: codesearch <query> [path]
codesearch "function"           # Search current directory
codesearch "TODO" ./src         # Search specific path
codesearch "class" ./src -e py  # Filter by extension

# Fuzzy search (handles typos)
codesearch "usrmngr" . --fuzzy

# Interactive mode
codesearch interactive

# Analysis commands
codesearch analyze              # Codebase metrics
codesearch complexity           # Complexity scores
codesearch duplicates           # Find similar code
codesearch deadcode             # Find unused code
```

## ✨ Features

- **Fast regex search** with exact line-level precision
- **Fuzzy matching** for typo tolerance
- **48 language support** with syntax awareness
- **Interactive REPL** for exploratory analysis
- **Code metrics** - complexity, duplication, dead code
- **Export** results to CSV or Markdown
- **MCP server** for AI agent integration
- **Parallel processing** for large codebases

## πŸ—οΈ Installation

```bash
git clone https://github.com/yingkitw/codesearch.git
cd codesearch
cargo build --release

# With MCP server support for AI agents
cargo build --release --features mcp
```

## πŸ“– Usage Examples

### Search Patterns

```bash
# codesearch <query> [path] [options]
codesearch "TODO"                       # Search current directory
codesearch "class" ./src                # Search specific folder
codesearch "error" . -e py,js,ts        # Filter by extensions

# Regex patterns
codesearch "fn\\s+\\w+" ./src -e rs     # Rust functions
codesearch "import.*from" . -e ts       # TypeScript imports

# Fuzzy search (handles typos)
codesearch "authetication" . --fuzzy    # Finds "authentication"
```

### Code Analysis

```bash
# Codebase overview
codesearch analyze
# Output: Files, lines, languages, function count, class count

# Complexity analysis
codesearch complexity --threshold 15 --sort
# Output: Files ranked by cyclomatic/cognitive complexity

# Dead code detection
codesearch deadcode -e rs,py,js
# Output: Unused imports, functions, classes

# Duplicate detection
codesearch duplicates --similarity 0.8
# Output: Similar code blocks that violate DRY
```

### Interactive Mode

```bash
codesearch interactive
```

**Commands:**
- Type any pattern to search
- `/f` - Toggle fuzzy mode
- `/i` - Toggle case insensitivity
- `analyze` - Codebase metrics
- `complexity` - Complexity analysis
- `deadcode` - Dead code detection
- `duplicates` - Find duplicates
- `help` - All commands

### MCP Server (AI Integration)

```bash
# Start MCP server
cargo run --features mcp -- mcp-server

# Agents can call:
# - search_code(query, path, extensions, fuzzy, regex)
# - list_files(path, extensions, exclude)
# - analyze_codebase(path, extensions)
```

## πŸ“Š Output Examples

### Search Results
```
πŸ” Search Results for "fn main"
──────────────────────────────

πŸ“ ./src/main.rs (1 match)
  358: fn main() -> Result<(), Box<dyn std::error::Error>> {

πŸ“Š Statistics:
  Files searched: 12
  Matches found: 1
  Time: 0.003s
```

### Dead Code Detection
```
πŸ” Dead Code Detection
──────────────────────────────

⚠️  Found 5 potential dead code items:

πŸ“„ examples/deadcode_demo.rs
   πŸ“₯ L   4: import 'HashMap' - Imported but never used
   πŸ“₯ L   6: import 'Write' - Imported but never used

πŸ“Š Summary:
   β€’ import: 5
```

### Complexity Analysis
```
πŸ“Š Code Complexity Analysis
──────────────────────────────

πŸ“ Files by Complexity (highest first):

  src/search.rs
    Cyclomatic: 45  Cognitive: 38  Lines: 645

  src/analysis.rs
    Cyclomatic: 28  Cognitive: 22  Lines: 378
```

## πŸ”§ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         CLI Layer                            β”‚
β”‚  main.rs (358 LOC) - Argument parsing, command routing       β”‚
β”‚  interactive.rs (350 LOC) - REPL interface                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Core Engine                             β”‚
β”‚  search.rs (645 LOC) - Pattern matching, fuzzy, ranking      β”‚
β”‚  language.rs (509 LOC) - 48 language definitions             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Analysis Layer                           β”‚
β”‚  analysis.rs (378 LOC) - Codebase metrics                   β”‚
β”‚  complexity.rs (308 LOC) - Cyclomatic/cognitive complexity   β”‚
β”‚  deadcode.rs (373 LOC) - Unused code detection              β”‚
β”‚  duplicates.rs (196 LOC) - Similar code detection           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Integration Layer                         β”‚
β”‚  mcp_server.rs (295 LOC) - MCP protocol for AI agents       β”‚
β”‚  export.rs (185 LOC) - CSV/Markdown output                  β”‚
β”‚  cache.rs (127 LOC) - Result caching                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**11 modules, ~3,800 lines of Rust code**

## πŸ§ͺ Testing

```bash
# Run all tests (84 total)
cargo test --features mcp

# Unit tests: 35 (core functionality)
# Integration tests: 26 (CLI commands)  
# MCP tests: 23 (server tools)
```

## ⚑ Performance

- **10x faster** than grep for complex patterns
- **Parallel processing** with rayon
- **Memory efficient** streaming for large files
- **Compiled regex** patterns cached
- **Smart defaults** exclude build directories

## πŸ“š Documentation

- [README.md]README.md - This guide
- [ARCHITECTURE.md]ARCHITECTURE.md - System design
- [TODO.md]TODO.md - Roadmap
- [examples/]examples/ - Code samples with dead code demos

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Test: `cargo test --features mcp`
4. Submit a pull request

## πŸ“„ License

Apache-2.0 License

---

**Built with ❀️ in Rust** | **Precise** | **Fast** | **Agent-Ready**

*"RAG tells you about code. Code Search shows you the code."*