Codanna
High-performance code intelligence that gives AI assistants deep understanding of your codebase through semantic search and relationship tracking.
What It Does
Codanna indexes your code and provides:
- Semantic search - Find code using natural language: "authentication logic", "parse JSON data"
- Relationship tracking - Who calls what, implementation hierarchies, dependency graphs
- MCP integration - Claude can navigate and understand your codebase in real-time
- Hot-reload - Changes are automatically re-indexed
- Fast searches - Results in <10ms
Under the hood, Codanna:
- Parses your code with tree-sitter (currently Rust and Python, more languages coming)
- Extracts symbols and their relationships using type-aware analysis
- Generates embeddings from documentation comments using AllMiniLML6V2 (384 dimensions)
- Stores everything in a Tantivy full-text index with integrated vector search
- Serves it via MCP so Claude can use it naturally
Installation
# Install latest version
# Install with HTTP/HTTPS server support
# Install from git
# Install from local path (development)
Quick Start
- Initialize and configure:
# Initialize codanna index space and create .codanna/settings.toml
# Enable semantic search in .codanna/settings.toml
- Enable semantic search in
.codanna/settings.toml:
[]
= true
- Index your codebase:
# Index with progress display
# See what would be indexed (dry run)
# Index a specific file
- Try semantic search:
Claude Integration
MCP Server (Recommended)
Add to your .mcp.json:
HTTP/HTTPS Server
For persistent server with real-time file watching:
# HTTP server
# HTTPS server (requires http-server feature)
Configure in .mcp.json:
For HTTPS configuration, see the HTTPS Server Mode documentation.
Claude Sub Agent
We include a codanna-navigator sub agent at .claude/agents/codanna-navigator.md. This agent is optimized for using the codanna MCP server.
Unix-Style Integration
Codanna CLI is unix-friendly, enabling powerful command chaining and integration with other tools:
&& \
&& \
&& \
This approach works well for agentic workflows and custom automation scripts.
Configuration
Configure Codanna in .codanna/settings.toml:
[]
= true
= "AllMiniLML6V2"
= 0.6 # Similarity threshold (0-1)
[]
= 16 # Auto-detected by default
= true # Index test files
Codanna respects .gitignore and adds its own .codannaignore:
# Created automatically by codanna init
Documentation Comments for Better Search
Semantic search works by understanding your documentation comments:
/// Parse configuration from a TOML file and validate required fields
/// This handles missing files gracefully and provides helpful error messages
With good comments, semantic search can find this function when prompted for:
- "configuration validation"
- "handle missing config files"
- "TOML parsing with error handling"
This encourages better documentation → better AI understanding → more motivation to document.
CLI Commands
Core Commands
| Command | Description | Example |
|---|---|---|
codanna init |
Set up .codanna directory with default configuration | codanna init --force |
codanna index <PATH> |
Build searchable index from your codebase | codanna index src --progress |
codanna config |
Display active settings | codanna config |
codanna serve |
Start MCP server for AI assistants | codanna serve --watch |
Retrieval Commands
| Command | Description | Example |
|---|---|---|
retrieve symbol <NAME> |
Find a symbol by name | codanna retrieve symbol main |
retrieve calls <FUNCTION> |
Show what functions a given function calls | codanna retrieve calls parse_file |
retrieve callers <FUNCTION> |
Show what functions call a given function | codanna retrieve callers main |
retrieve implementations <TRAIT> |
Show what types implement a trait | codanna retrieve implementations Parser |
retrieve impact <SYMBOL> |
Show the impact radius of changing a symbol | codanna retrieve impact main --depth 3 |
retrieve search <QUERY> |
Search for symbols using full-text search | codanna retrieve search "parse" --limit 5 |
retrieve describe <SYMBOL> |
Show comprehensive information about a symbol | codanna retrieve describe SimpleIndexer |
Testing and Utilities
| Command | Description | Example |
|---|---|---|
codanna mcp-test |
Verify Claude can connect and list available tools | codanna mcp-test |
codanna mcp <TOOL> |
Execute MCP tools without spawning server | codanna mcp find_symbol --args '{"name":"main"}' |
codanna benchmark |
Benchmark parser performance | codanna benchmark rust --file my_code.rs |
Common Flags
--config,-c: Path to custom settings.toml file--force,-f: Force operation (overwrite, re-index, etc.)--progress,-p: Show progress during operations--threads,-t: Number of threads to use--dry-run: Show what would happen without executing
MCP Tools
Available tools when using the MCP server:
| Tool | Description | Key Parameters |
|---|---|---|
find_symbol |
Find a symbol by exact name | name (required) |
search_symbols |
Search symbols with full-text fuzzy matching | query, limit, kind, module |
semantic_search_docs |
Search using natural language queries | query, limit, threshold |
semantic_search_with_context |
Search with enhanced context and details | query, limit, threshold |
get_calls |
Show functions called by a given function | function_name |
find_callers |
Show functions that call a given function | function_name |
analyze_impact |
Analyze the impact radius of symbol changes | symbol_name, max_depth |
get_index_info |
Get index statistics and metadata | None |
Performance
Parser benchmarks on a 750-symbol test file:
| Language | Parsing Speed | vs. Target (10k/s) | Status |
|---|---|---|---|
| Rust | 91,318 symbols/sec | 9.1x faster ✅ | Production |
| Python | 75,047 symbols/sec | 7.5x faster ✅ | Production |
| JavaScript | - | - | Coming soon |
| TypeScript | - | - | Coming soon |
Key achievements:
- Zero-cost abstractions: All parsers use borrowed string slices with no allocations in hot paths
- Parallel processing: Multi-threaded indexing that scales with CPU cores
- Memory efficiency: Approximately 100 bytes per symbol including all metadata
- Real-time capability: Fast enough for incremental parsing during editing
Run performance benchmarks:
Architecture Highlights
Memory-mapped vector storage: Semantic embeddings are stored in memory-mapped files for instant loading after the OS page cache warms up.
Embedding lifecycle management: Old embeddings are automatically cleaned up when files are re-indexed to prevent accumulation over time.
Lock-free concurrency: Uses DashMap for concurrent symbol access with minimal blocking for write coordination.
Single-pass indexing: Extracts symbols, relationships, and generates embeddings in one complete AST traversal.
Hot reload capability: Event-driven file watching with debouncing indexes only changed files for efficient updates.
Requirements
- Rust 1.75+ (for development)
- ~150MB for model storage (downloaded on first use)
- A few MB for index storage (varies by codebase size)
Current Limitations
- Supports Rust and Python (JavaScript, TypeScript coming soon)
- Semantic search requires English documentation/comments
- Windows support is experimental
Roadmap
Status Overview
| Priority | Feature | Status | Target |
|---|---|---|---|
| 1 | JSON Output Support | In-Progress | v0.2.1 |
| 2 | Exit Codes for Common Conditions | In-Progress | v0.2.2 |
| 3 | Batch Symbol Operations | Planning | v0.2.3 |
| 4 | Output Format Control | Planning | v0.2.3 |
| 5 | Direct CLI Semantic Search | Pending | -- |
| 6 | Incremental Index Updates | Completed | v2.0.0 |
| 7 | Query Language for Complex Searches | Pending | -- |
| 8 | Symbol Relationship Graph Export | Pending | -- |
| 9 | Diff-Aware Analysis | Pending | -- |
| 10 | Configuration Profiles | Pending | -- |
| 11 | Machine-Readable Progress | Pending | -- |
1. Direct CLI Semantic Search
Why: Currently semantic search is only available through MCP interface
# Current: Only through MCP
# Wishlist: Direct CLI command
Benefits:
- Simpler command syntax
- Better Unix integration
- No JSON escaping needed
2. JSON Output Support
Why: Enable reliable programmatic integration without text parsing
# Add --json flag to commands
{
}
Benefits:
- Stable API for scripts and tools
- No more awk/grep gymnastics
- Enable IDE integrations
2. Batch Symbol Operations
Why: Reduce overhead when analyzing multiple symbols
# Current: Multiple invocations
for; do
done
# Wishlist: Single command
Benefits:
- One index load instead of N
- Faster CI/CD pipelines
- Better for parallel analysis
3. Output Format Control
Why: Different use cases need different detail levels
# Compact output for scripts
# Full output for humans (current default)
4. Exit Codes for Common Conditions
Why: Make scripting more robust
# Exit codes:
# 0 - Success
# 1 - Error
# 2 - No results found
# 3 - Index not found
# 4 - Symbol not found
if ; then
else
fi
5. Query Language for Complex Searches
Why: Find symbols matching multiple criteria without multiple commands
# Find all public methods that call database functions
# Find unused private functions
6. Incremental Index Updates
Why: Faster re-indexing for large codebases
# Only re-index changed files
# Watch mode for development
7. Symbol Relationship Graph Export
Why: Visualize complex dependencies
# Export full dependency graph
# Export focused subgraph
8. Diff-Aware Analysis
Why: Focus analysis on what changed
# Analyze impact of changes in a PR
# Pre-commit hook helper
9. Configuration Profiles
Why: Different settings for different use cases
# .codanna/profiles.toml
# Use profile
10. Machine-Readable Progress
Why: Better CI/CD integration
# Current: Human-readable progress
# Wishlist: Machine-readable option
}
}
Implementation Priority
- JSON output - Enables everything else
- Exit codes - Minimal change, big impact
- Batch operations - Performance win
- Format control - Flexibility for users
- Rest - Nice to have
Contributing
This is an early release focused on core functionality. Contributions welcome! See CONTRIBUTING.md for guidelines.
License
Licensed under the Apache License, Version 2.0 - See LICENSE file for details.
Attribution required when using Codanna in your project. See NOTICE file.
Built with 🦀 by developers who wanted their AI assistants to actually understand their code.