Codanna
Semantic code search and relationship tracking via MCP and Unix CLI.
Table of Contents
- How It Works
- Installation
- Quick Start
- Claude Integration
- Configuration
- Documentation Comments for Better Search
- CLI Commands
- MCP Tools
- Performance
- Architecture Highlights
- Requirements
- Current Limitations
- Roadmap
- Feature Details
- Contributing
- License
How It Works
- Parse - Tree-sitter AST parsing for Rust, Python, and PHP (more languages coming)
- Extract - Symbols, call graphs, implementations, and type relationships
- Embed - 384-dimensional vectors from doc comments via AllMiniLML6V2
- Index - Tantivy for full-text search + memory-mapped symbol cache for <10ms lookups
- Serve - MCP protocol for AI assistants, ~300ms response time
Installation
# Install latest version
# Install with HTTP server (OAuth authentication)
# Install with HTTPS server (TLS + optional OAuth)
# Install from local path (development)
Quick Start
- Initialize:
# Initialize codanna index space and create .codanna/settings.toml
- Index your codebase:
# Index with progress display
# See what would be indexed (dry run)
# Index a specific file
- Search your code:
# Semantic search with new simplified syntax
# Find symbols with JSON output
# Analyze function relationships
|
# Legacy format still works
Claude Integration
MCP Server (Recommended)
Add to your .mcp.json:
HTTP/HTTPS Server
For persistent server with real-time file watching:
# HTTP server with OAuth authentication (requires http-server feature)
# HTTPS server with TLS encryption (requires https-server feature)
Configure in .mcp.json:
For HTTPS configuration, see the HTTPS Server Mode documentation.
Claude Sub Agent
We include a codanna-navigator sub agent at .claude/agents/codanna-navigator.md. This agent is optimized for using the codanna MCP server.
Unix-Style Integration
Codanna CLI is unix-friendly with positional arguments and JSON output for easy command chaining:
# New simplified syntax - positional arguments for simple tools
# Key:value pairs for complex tools
# Unix piping with JSON output
| \
| \
| \
# Result:
#
# main in crate::main
# serve_http in crate::mcp::http_server::serve_http
# serve_http in crate::mcp::http_server::serve_http
# serve_https in crate::mcp::https_server::serve_https
# serve_https in crate::mcp::https_server::serve_https
# parse in crate::parsing::rust::parse
# parse in crate::parsing::rust::parse
# parse in crate::parsing::python::parse
#
# codanna mcp search_symbols query:parse limit:1 --json 0.10s user 0.08s system 122% cpu 0.143 total
# jq -r '.data[0].name' 0.00s user 0.00s system 3% cpu 0.142 total
# xargs -I {} codanna retrieve callers {} --json 0.11s user 0.07s system 63% cpu 0.288 total
# jq -r '.data[] | "\(.name) in \(.module_path)"' 0.00s user 0.00s system 1% cpu 0.288 total
# Legacy format still supported for backward compatibility
All MCP tools support --json flag for structured output, making integration with other tools seamless.
Configuration
Configure Codanna in .codanna/settings.toml:
[]
= true
= "AllMiniLML6V2"
= 0.6 # Similarity threshold (0-1)
[]
= 16 # Auto-detected by default
= true # Index test files
Codanna respects .gitignore and adds its own .codannaignore:
# Created automatically by codanna init
Documentation Comments for Better Search
Semantic search works by understanding your documentation comments:
/// Parse configuration from a TOML file and validate required fields
/// This handles missing files gracefully and provides helpful error messages
With good comments, semantic search can find this function when prompted for:
- "configuration validation"
- "handle missing config files"
- "TOML parsing with error handling"
This encourages better documentation → better AI understanding → more motivation to document.
CLI Commands
Core Commands
| Command | Description | Example |
|---|---|---|
codanna init |
Set up .codanna directory with default configuration | codanna init --force |
codanna index <PATH> |
Build searchable index from your codebase | codanna index src --progress |
codanna config |
Display active settings | codanna config |
codanna serve |
Start MCP server for AI assistants | codanna serve --watch |
Retrieval Commands
All retrieve commands support --json flag for structured output (exit code 3 when not found).
| Command | Description | Example |
|---|---|---|
retrieve symbol <NAME> |
Find a symbol by name | codanna retrieve symbol main --json |
retrieve calls <FUNCTION> |
Show what functions a given function calls | codanna retrieve calls parse_file --json |
retrieve callers <FUNCTION> |
Show what functions call a given function | codanna retrieve callers main --json |
retrieve implementations <TRAIT> |
Show what types implement a trait | codanna retrieve implementations Parser --json |
retrieve impact <SYMBOL> |
Show the impact radius of changing a symbol | codanna retrieve impact main --depth 3 --json |
retrieve search <QUERY> |
Search for symbols using full-text search | codanna retrieve search "parse" --limit 5 --json |
retrieve describe <SYMBOL> |
Show comprehensive information about a symbol | codanna retrieve describe SimpleIndexer --json |
Testing and Utilities
| Command | Description | Example |
|---|---|---|
codanna mcp-test |
Verify Claude can connect and list available tools | codanna mcp-test |
codanna mcp <TOOL> |
Execute MCP tools without spawning server | codanna mcp find_symbol main --json |
codanna benchmark |
Benchmark parser performance | codanna benchmark rust --file my_code.rs |
Common Flags
--config,-c: Path to custom settings.toml file--force,-f: Force operation (overwrite, re-index, etc.)--progress,-p: Show progress during operations--threads,-t: Number of threads to use--dry-run: Show what would happen without executing
MCP Tools
Available tools when using the MCP server. All tools support --json flag for structured output.
Simple Tools (Positional Arguments)
| Tool | Description | Example |
|---|---|---|
find_symbol |
Find a symbol by exact name | codanna mcp find_symbol main --json |
get_calls |
Show functions called by a given function | codanna mcp get_calls process_file |
find_callers |
Show functions that call a given function | codanna mcp find_callers init |
analyze_impact |
Analyze the impact radius of symbol changes | codanna mcp analyze_impact Parser --json |
get_index_info |
Get index statistics and metadata | codanna mcp get_index_info --json |
Complex Tools (Key:Value Arguments)
| Tool | Description | Example |
|---|---|---|
search_symbols |
Search symbols with full-text fuzzy matching | codanna mcp search_symbols query:parse kind:function limit:10 |
semantic_search_docs |
Search using natural language queries | codanna mcp semantic_search_docs query:"error handling" limit:5 |
semantic_search_with_context |
Search with enhanced context | codanna mcp semantic_search_with_context query:"parse files" threshold:0.7 |
Parameters Reference
| Tool | Parameters |
|---|---|
find_symbol |
name (required) |
search_symbols |
query, limit, kind, module |
semantic_search_docs |
query, limit, threshold |
semantic_search_with_context |
query, limit, threshold |
get_calls |
function_name |
find_callers |
function_name |
analyze_impact |
symbol_name, max_depth |
get_index_info |
None |
Performance
Parser benchmarks on a 750-symbol test file:
| Language | Parsing Speed | vs. Target (10k/s) | Status |
|---|---|---|---|
| Rust | 91,318 symbols/sec | 9.1x faster ✓ | Production |
| Python | 75,047 symbols/sec | 7.5x faster ✓ | Production |
| PHP | 68,432 symbols/sec | 6.8x faster ✓ | Production |
| JavaScript | - | - | v0.4.1 |
| TypeScript | - | - | v0.4.1 |
Key achievements:
- Zero-cost abstractions: All parsers use borrowed string slices with no allocations in hot paths
- Parallel processing: Multi-threaded indexing that scales with CPU cores
- Memory efficiency: Approximately 100 bytes per symbol including all metadata
- Real-time capability: Fast enough for incremental parsing during editing
- Optimized CLI startup: ~300ms for all operations (53x improvement from v0.2)
- JSON output: Zero overhead - structured output adds <1ms to response time
Run performance benchmarks:
Architecture Highlights
Memory-mapped storage: Two caches for different access patterns:
symbol_cache.bin- FNV-1a hashed symbol lookups, <10ms response timesegment_0.vec- 384-dimensional vectors, <1μs access after OS page cache warm-up
Embedding lifecycle management: Old embeddings deleted when files are re-indexed to prevent accumulation.
Lock-free concurrency: DashMap for concurrent symbol reads, write coordination via single writer lock.
Single-pass indexing: Symbols, relationships, and embeddings extracted in one AST traversal.
Hot reload: File watcher with 500ms debounce triggers re-indexing of changed files only.
Requirements
- Rust 1.75+ (for development)
- ~150MB for model storage (downloaded on first use)
- A few MB for index storage (varies by codebase size)
Current Limitations
- Supports Rust, Python, and PHP (JavaScript/TypeScript coming in v0.4.1)
- Semantic search requires English documentation/comments
- Windows support is experimental
Roadmap
Version Strategy
- 0.3.x - CLI improvements and API stability
- 0.4.x - Language expansion via modular architecture
- 0.5.x - Enterprise features and advanced analysis
v0.3.0 (Released)
| Feature | Description | Status |
|---|---|---|
| JSON Output Support | Structured output for all commands | ✓ |
| Exit Codes | Semantic exit codes for scripting | ✓ |
| Unix-Friendly CLI | Positional args and key:value syntax | ✓ |
| Incremental Index Updates | File watching with auto re-indexing | ✓ |
v0.4.0 (Released)
| Feature | Description | Status |
|---|---|---|
| Language Registry Architecture | Modular parser system for easy language additions | ✓ |
| PHP Support | Full PHP parser implementation | ✓ |
| Enhanced Symbol Extraction | Comprehensive symbol extraction for all parsers | ✓ |
v0.4.1 (Planned)
| Feature | Description | Status |
|---|---|---|
| JavaScript Support | Full JavaScript/ES6+ parser | ○ |
| TypeScript Support | TypeScript with type annotations | ○ |
v0.4.2 (Planned)
| Feature | Description | Status |
|---|---|---|
| Go Support | Go language with interfaces and goroutines | ○ |
v0.4.3 (Planned)
| Feature | Description | Status |
|---|---|---|
| C# Support | C# with .NET ecosystem support | ○ |
v0.4.4 (Planned)
| Feature | Description | Status |
|---|---|---|
| Java Support | Java with class hierarchies | ○ |
v0.4.5 (Planned)
| Feature | Description | Status |
|---|---|---|
| C/C++ Support | C and C++ with headers and templates | ○ |
v0.5.0 (Future)
| Feature | Description | Status |
|---|---|---|
| Direct Semantic Search | retrieve semantic command |
○ |
| Batch Operations | Process multiple symbols in one call | ○ |
| Output Format Control | Compact/full/json output modes | ○ |
| Query Language | Advanced search with complex filters | ○ |
| Configuration Profiles | Environment-specific settings | ○ |
| Machine-Readable Progress | JSON progress output | ○ |
| Cross-Language References | Track references across languages | ○ |
| Language Server Protocol | LSP integration for IDEs | ○ |
Legend: ✓ Complete | → In Progress | ○ Planned
Supported Languages
Currently Supported (v0.4.0)
- Rust - Full support with traits, generics, enums, type aliases, constants, and statics
- Python - Functions, classes, module-level variables, constants, and lambda functions
- PHP - Classes, functions, namespaces, traits, global constants, and variables
Coming Soon
Based on developer demand and tree-sitter support:
- JavaScript/TypeScript (v0.4.1) - Most requested for web development
- Go (v0.4.2) - Growing popularity in cloud/backend
- C# (v0.4.3) - Enterprise and game development
- Java (v0.4.4) - Enterprise applications
- C/C++ (v0.4.5) - Systems programming
Feature Details
Completed Features
json-output-support
All retrieve commands and MCP tools support --json flag for structured output with consistent format and proper exit codes (v0.3.0).
exit-codes
Semantic exit codes for scripting: 0 (success), 1 (general error), 3 (not found). Enables reliable automation (v0.3.0).
unix-friendly-cli
Simplified syntax with positional arguments for simple tools and key:value pairs for complex tools. No JSON escaping needed (v0.3.0).
incremental-index-updates
Watch mode with automatic re-indexing of changed files. Broadcast channels coordinate updates with 500ms debouncing (v0.3.0).
language-registry-architecture
Modular parser system where languages self-register via a registry. Enables easy addition of new languages without core code changes (v0.4.0).
php-support
Full PHP parser with classes, functions, namespaces, and traits. Supports PHP 5 through PHP 8 syntax (v0.4.0).
enhanced-symbol-extraction
Expanded symbol extraction across all parsers. Rust now extracts enums, type aliases, constants, and statics. Python extracts module-level variables, constants by naming convention, and lambda functions. PHP extracts global constants (const and define) and global variables (v0.4.0).
Planned Features
direct-semantic-search
Direct retrieve semantic command for natural language code search without going through MCP interface.
batch-operations
Process multiple symbols in a single command to reduce overhead and improve CI/CD performance.
output-format-control
Choose between compact (script-friendly), full (human-readable), and json output formats.
javascript-support
Full JavaScript/ES6+ parser with modules, classes, async/await, and JSX support.
typescript-support
TypeScript parser with full type annotation support, interfaces, and decorators.
go-support
Go language parser with interfaces, goroutines, channels, and struct methods.
csharp-support
C# parser with .NET ecosystem support, LINQ, async/await, and attributes.
java-support
Java parser with class hierarchies, interfaces, generics, and annotations.
c-cpp-support
C and C++ parsers with headers, templates, macros, and cross-compilation units.
query-language
Advanced search syntax with wildcards, boolean operators, and complex filters.
configuration-profiles
Environment-specific settings (dev, test, production) with profile inheritance.
machine-readable-progress
JSON-formatted progress output for better CI/CD integration and monitoring.
cross-language-references
Track and analyze references across different programming languages in polyglot codebases.
language-server-protocol
LSP implementation for IDE integration with real-time code intelligence.
Contributing
This is an early release focused on core functionality. Contributions welcome! See CONTRIBUTING for guidelines.
License
Licensed under the Apache License, Version 2.0 - See LICENSE file for details.
Attribution required when using Codanna in your project. See NOTICE file.
Built with 🦀 by developers who wanted their AI assistants to actually understand their code.