CodeSearch
Fast, intelligent code search and analysis for 48+ programming languages.
Find what you need in seconds: functions, classes, duplicates, dead code, complexity issues.
Why CodeSearch?
Stop Wasting Time Searching Code
Problem: You're working in a large codebase and need to:
- Find where authentication logic is implemented
- Identify all usages of a deprecated function before refactoring
- Track down technical debt (TODOs, FIXMEs) scattered across files
- Understand complex function relationships and dependencies
- Find duplicated code that violates DRY principles
- Spot overly complex functions that need refactoring
Traditional tools fall short:
grepis slow and doesn't understand code structure- IDE search is limited to single projects/languages
- Manual code review is time-consuming and error-prone
CodeSearch solves these problems:
# Find authentication logic instantly
# Result: All authentication code, ranked by relevance
# Track technical debt before sprint planning
# Result: 15 unused functions, 23 TODOs, 8 unreachable blocks
# Find duplicates before they become maintenance nightmares
# Result: 12 code clones (80%+ similarity)
What Makes CodeSearch Different?
Unlike Joern (CPG graph DB, Scala queries, security research) or CodeQL (QL logic language, GitHub extractors, path queries), CodeSearch needs no indexing or config—and beats them on time-to-result:
- Unified graph (
codesearch graph unified) — AST+CFG+DFG in one, like Joern's CPG but no DB - Data-flow trace (
codesearch flow <var>) — path-style tracing without extractors - Security scan (
codesearch security) — eval/exec/SQL patterns instantly - First result — sub-second vs. minutes of import/build
See docs/CAPABILITY_REDESIGN.md for full comparison.
| Feature | Benefit | Example |
|---|---|---|
| Language-Aware | Understands functions, classes, imports in 48+ languages | Find fn main in Rust, def main in Python |
| Lightning Fast | Parallel processing with Rust, typical searches in 3-50ms | Search 1000 files in < 50ms |
| Intelligent | Fuzzy matching handles typos, semantic search understands context | codesearch "authetication" finds "authentication" |
| Code Quality | Detects dead code, duplicates, complexity issues automatically | codesearch complexity flags functions needing refactoring |
| Graph Analysis | 6 types of graphs for deep code understanding | Call graphs show function relationships |
| Developer-Friendly | Interactive mode, multiple export formats, MCP for AI agents | codesearch interactive for REPL-style search |
Real-World Impact
- Save Hours per Week: Replace manual code hunting with instant searches
- Ship Better Code: Catch dead code and complexity issues before review
- Understand Faster: Visualize code relationships with graph analysis
- Reduce Technical Debt: Track and eliminate code quality issues systematically
Quick Start
Installation
# Clone and build
# The binary will be at: ./target/release/codesearch
# Optional: Add to PATH
Basic Usage
# Simple search - find anything in your codebase
# Search with file type filter
# Fuzzy search (handles typos!)
# Interactive mode
Usage Examples
1. Everyday Search Tasks
Find Function Definitions
# Find all functions named "process"
# Find class definitions
# Find async functions
Track Technical Debt
# Find all TODOs and FIXMEs
# Export to CSV for tracking
Refactor Safely
# Find all usages before refactoring
# Case-sensitive search for exact matches
2. Code Quality Analysis
Health Score
# Get overall code health score
# Output:
# 🏥 Code Health Report
# Overall Health Score: 85/100 ✅
#
# Components:
# • Dead Code: 90/100 (3 issues)
# • Duplicates: 95/100 (2 duplicates)
# • Complexity: 70/100 (5 high-complexity functions)
# CI/CD integration with fail threshold
Detect Dead Code
# Find unused code
# Output:
# ⚠️ Found 12 potential dead code items:
# [var] L 10: variable 'unused_var'
# [∅] L 42: empty_helper()
# [?] L 58: TODO marker
# [!] L 72: unreachable code
# Export for code review
Analyze Complexity
# Find complex functions that need refactoring
# Output:
# 📊 Files by Complexity:
# src/auth.rs: Cyclomatic 45, Cognitive 38 (HIGH)
# src/parser.rs: Cyclomatic 28, Cognitive 22 (MEDIUM)
# Comprehensive code metrics
Find Code Duplicates
# Identify copy-pasted code
# Output:
# 🔍 Found 8 duplicate code blocks:
# auth.rs:120-145 vs user.rs:89-114 (85% similar)
3. Understanding Codebases
Codebase Overview
# Get high-level metrics
# Output:
# Overview
# Total files: 156
# Total lines: 45,230
# Languages: Rust (60%), Python (25%), TypeScript (15%)
# Functions: 892
# Classes: 124
Explore Function Relationships
# Generate call graph
# Output:
# Call Graph Analysis:
# Functions: 28
# Function calls: 156
# Recursive: authenticate()
# Dead (never called): legacy_auth()
Control Flow Analysis
# Understand execution paths
# Shows: Basic blocks, branches, loops, unreachable code
Unified Graph (CPG-style, no DB)
# AST + CFG + DFG in one—like Joern's CPG, instant
# Output: Syntax edges, execution edges, data edges
Data-Flow Trace (path-query style)
# Trace variable flow—no extractors, no indexing
Security Pattern Scan
# Instant security checks—eval, exec, SQL concat, etc.
Advanced Symbol Search
# Find symbols with rich context and relationships
# Get detailed symbol information with documentation
# Find symbol relationships and hierarchies
# Build comprehensive symbol index
4. Advanced Workflows
Interactive Mode
# Commands in interactive mode:
# authenticate - Search for "authenticate"
# /f - Toggle fuzzy matching
# /i - Toggle case sensitivity
# analyze - Show codebase metrics
# complexity - Show complexity analysis
# deadcode - Find dead code
# help - Show all commands
Search with Ranking
# Get results ranked by relevance
# Best match first:
# src/auth/mod.rs:10 - pub fn authenticate() { ... }
# src/user.rs:45 - fn check_auth() { ... }
Export Results
# CSV for spreadsheets
# Markdown for documentation
# JSON for automation
|
5. Special Features
Search Git History
# Search through commit history
Search Remote Repositories
# Search GitHub/GitLab without cloning
Build Index for Large Codebases
# Incremental indexing for faster searches
# Watch for changes and auto-update
Common Commands Reference
Search Commands
# Options:
Analysis Commands
Graph Commands
Utility Commands
Real-World Examples
Example 1: Pre-Code Review Checklist
#!/bin/bash
# review.sh - Automated code review checklist
Example 2: Learning a New Codebase
# Step 1: Understand the structure
# Step 2: Find entry points
# Step 3: Explore key modules
|
# Step 4: Understand function relationships
|
# Step 5: Find complex code to review
|
Example 3: Refactoring Workflow
# Before refactoring, find all usages
# Check for complexity issues
# Find similar code that could be consolidated
# After refactoring, verify no old code remains
# Should return: "No matches found"
Example 4: Continuous Quality Monitoring
# Add to CI/CD pipeline
# Fail if high complexity functions detected
complexity=
max_cc=
if [; then
fi
# Fail if new dead code introduced
deadcode_count=
if [; then
fi
Demo Project
A comprehensive example project demonstrating all CodeSearch capabilities is available in the examples/demo-project/ directory.
Run the demo:
Demo includes:
- Multi-language codebase (Rust, Python, TypeScript)
- Intentional code quality issues for detection
- All analysis types demonstrated
- Real-world usage examples
Architecture & Quality
Code Quality Standards
- ✅ 100% test pass rate (232 tests total: 173 unit + 36 integration + 23 MCP)
- ✅ Zero clippy warnings (clean code quality)
- ✅ Modular architecture (45+ focused modules)
- ✅ Thread-safe parallel processing with rayon
- ✅ Comprehensive error handling with custom error types
- ✅ Trait abstractions for extensibility and testability
- ✅ Advanced symbol system for deep code understanding
Performance
- Fast: 3-50ms for typical searches (< 1000 files)
- Parallel: Auto-scales to available CPU cores
- Smart caching: 70-90% cache hit rate for repeated searches
- Memory efficient: Streaming file reading, < 100MB for 10K files
Supported Languages
Native Parsers (High Performance):
- Rust - Full AST parsing with zero-allocation tokenizer
- Python - Complete syntax support including async/await
- JavaScript/TypeScript - ES6+, JSX, TSX support
- Go - Structs, interfaces, methods with receivers
- Java - Classes, interfaces, enums, annotations
48+ Additional Languages via regex patterns including: C/C++, Ruby, PHP, Swift, Kotlin, C#, Haskell, Elixir, Erlang, Scala, Lua, Perl, Shell, SQL, YAML, TOML, JSON, and more.
See codesearch languages for the complete list.
Additional Resources
- Demo Project - Hands-on examples
- DEMO_GUIDE.md - Comprehensive usage guide
- ARCHITECTURE.md - Technical details and design
- SPEC.md - Technical specification
- CLAUDE.md - Contributor guide
- MCP_ENHANCEMENTS.md - Advanced symbol system documentation
Model Context Protocol (MCP) Integration
CodeSearch provides 15 MCP tools for AI agent integration, enabling sophisticated code analysis without LLM processing:
Core Tools (9 tools)
search_code- Pattern search with filters and rankinglist_files- Directory enumeration and file discoveryanalyze_codebase- Comprehensive codebase metrics and statisticsdetect_complexity- Cyclomatic and cognitive complexity analysisdetect_duplicates- Code duplication detectiondetect_deadcode- Dead code detection (6+ types)detect_circular- Circular dependency detectionfind_symbol- Symbol finding (definition, references, callers)get_health- Overall code health scoring
Advanced Symbol Tools (6 tools)
search_symbols- Advanced symbol search with context and relationshipsget_symbol_details- Comprehensive symbol information with documentationfind_symbol_relationships- Symbol relationship and hierarchy queriesbuild_symbol_index- Create/update symbol index with change detectionget_index_stats- Index metadata and performance statisticsfind_symbol_hierarchy- Complete inheritance and implementation analysis
MCP Server Usage
# Start the MCP server
# Available via stdio transport
# Integrates with Claude Code, Cursor, Windsurf, and other MCP-compatible tools
Features:
- No LLM Dependency: All search and analysis is deterministic
- Language-Aware: Understands code structure, not just text
- Relationship-First: Maintains complete symbol relationship graph
- Performance-Optimized: Concurrent processing and efficient indexing
- AI-Ready: Rich context and metadata for agent consumption
License
Apache-2.0 License
Built with Rust • Fast • Precise • 48+ Languages
Version: 0.1.10