codanna 0.3.0

Code Intelligence for Large Language Models
Documentation

Codanna

Semantic code search and relationship tracking via MCP and Unix CLI.

How It Works

  1. Parse - Tree-sitter AST parsing for Rust and Python (JavaScript/TypeScript coming)
  2. Extract - Symbols, call graphs, implementations, and type relationships
  3. Embed - 384-dimensional vectors from doc comments via AllMiniLML6V2
  4. Index - Tantivy for full-text search + memory-mapped symbol cache for <10ms lookups
  5. Serve - MCP protocol for AI assistants, ~300ms response time

Installation

# Install latest version
cargo install codanna

# Install with HTTP server (OAuth authentication)
cargo install codanna --features http-server

# Install with HTTPS server (TLS + optional OAuth)
cargo install codanna --features https-server

# Install from local path (development)
cargo install --path . --all-features

Quick Start

  1. Initialize and configure:
# Initialize codanna index space and create .codanna/settings.toml
codanna init

# Enable semantic search in .codanna/settings.toml
  1. Enable semantic search in .codanna/settings.toml:
[semantic_search]
enabled = true
  1. Index your codebase:
# Index with progress display
codanna index src --progress

# See what would be indexed (dry run)
codanna index . --dry-run

# Index a specific file
codanna index src/main.rs
  1. Search your code:
# Semantic search with new simplified syntax
codanna mcp semantic_search_docs query:"parse rust files" limit:3 --json

# Find symbols with JSON output
codanna retrieve symbol Parser --json

# Analyze function relationships
codanna mcp find_callers process_file --json | jq '.data[].name'

# Legacy format still works
codanna mcp semantic_search_with_context --args '{"query": "parse rust files and extract symbols", "limit": 3}'

Claude Integration

MCP Server (Recommended)

Add to your .mcp.json:

{
  "mcpServers": {
    "codanna": {
      "command": "codanna",
      "args": ["serve", "--watch", "--watch-interval", "5"]
    }
  }
}

HTTP/HTTPS Server

For persistent server with real-time file watching:

# HTTP server with OAuth authentication (requires http-server feature)
codanna serve --http --watch

# HTTPS server with TLS encryption (requires https-server feature)
codanna serve --https --watch

Configure in .mcp.json:

{
  "mcpServers": {
    "codanna-sse": {
      "type": "sse",
      "url": "http://127.0.0.1:8080/mcp/sse"
    }
  }
}

For HTTPS configuration, see the HTTPS Server Mode documentation.

Claude Sub Agent

We include a codanna-navigator sub agent at .claude/agents/codanna-navigator.md. This agent is optimized for using the codanna MCP server.

Unix-Style Integration

Codanna CLI is unix-friendly with positional arguments and JSON output for easy command chaining:

# New simplified syntax - positional arguments for simple tools
codanna mcp find_symbol main --json
codanna mcp get_calls process_file
codanna mcp find_callers init

# Key:value pairs for complex tools  
codanna mcp semantic_search_docs query:"error handling" limit:3 --json
codanna mcp search_symbols query:parse kind:function --json

# Powerful Unix piping with JSON output
echo "error handling" | codanna mcp semantic_search_docs --json | jq '.data[].name'
codanna mcp find_symbol Parser --json | jq -r '.data[].callers[].name' | \
  xargs -I {} codanna mcp find_symbol {} --json

# Legacy format still supported for backward compatibility
codanna mcp find_symbol --args '{"name": "main"}'

All MCP tools support --json flag for structured output, making integration with other tools seamless.

Configuration

Configure Codanna in .codanna/settings.toml:

[semantic_search]
enabled = true
model = "AllMiniLML6V2"
threshold = 0.6  # Similarity threshold (0-1)

[indexing]
parallel_threads = 16  # Auto-detected by default
include_tests = true   # Index test files

Codanna respects .gitignore and adds its own .codannaignore:

# Created automatically by codanna init
.codanna/       # Don't index own data
target/         # Skip build artifacts
node_modules/   # Skip dependencies
*_test.rs       # Optionally skip tests

Documentation Comments for Better Search

Semantic search works by understanding your documentation comments:

/// Parse configuration from a TOML file and validate required fields
/// This handles missing files gracefully and provides helpful error messages
fn load_config(path: &Path) -> Result<Config, Error> {
    // implementation...
}

With good comments, semantic search can find this function when prompted for:

  • "configuration validation"
  • "handle missing config files"
  • "TOML parsing with error handling"

This encourages better documentation → better AI understanding → more motivation to document.

CLI Commands

Core Commands

Command Description Example
codanna init Set up .codanna directory with default configuration codanna init --force
codanna index <PATH> Build searchable index from your codebase codanna index src --progress
codanna config Display active settings codanna config
codanna serve Start MCP server for AI assistants codanna serve --watch

Retrieval Commands

All retrieve commands support --json flag for structured output (exit code 3 when not found).

Command Description Example
retrieve symbol <NAME> Find a symbol by name codanna retrieve symbol main --json
retrieve calls <FUNCTION> Show what functions a given function calls codanna retrieve calls parse_file --json
retrieve callers <FUNCTION> Show what functions call a given function codanna retrieve callers main --json
retrieve implementations <TRAIT> Show what types implement a trait codanna retrieve implementations Parser --json
retrieve impact <SYMBOL> Show the impact radius of changing a symbol codanna retrieve impact main --depth 3 --json
retrieve search <QUERY> Search for symbols using full-text search codanna retrieve search "parse" --limit 5 --json
retrieve describe <SYMBOL> Show comprehensive information about a symbol codanna retrieve describe SimpleIndexer --json

Testing and Utilities

Command Description Example
codanna mcp-test Verify Claude can connect and list available tools codanna mcp-test
codanna mcp <TOOL> Execute MCP tools without spawning server codanna mcp find_symbol main --json
codanna benchmark Benchmark parser performance codanna benchmark rust --file my_code.rs

Common Flags

  • --config, -c: Path to custom settings.toml file
  • --force, -f: Force operation (overwrite, re-index, etc.)
  • --progress, -p: Show progress during operations
  • --threads, -t: Number of threads to use
  • --dry-run: Show what would happen without executing

MCP Tools

Available tools when using the MCP server. All tools support --json flag for structured output.

Simple Tools (Positional Arguments)

Tool Description Example
find_symbol Find a symbol by exact name codanna mcp find_symbol main --json
get_calls Show functions called by a given function codanna mcp get_calls process_file
find_callers Show functions that call a given function codanna mcp find_callers init
analyze_impact Analyze the impact radius of symbol changes codanna mcp analyze_impact Parser --json
get_index_info Get index statistics and metadata codanna mcp get_index_info --json

Complex Tools (Key:Value Arguments)

Tool Description Example
search_symbols Search symbols with full-text fuzzy matching codanna mcp search_symbols query:parse kind:function limit:10
semantic_search_docs Search using natural language queries codanna mcp semantic_search_docs query:"error handling" limit:5
semantic_search_with_context Search with enhanced context codanna mcp semantic_search_with_context query:"parse files" threshold:0.7

Parameters Reference

Tool Parameters
find_symbol name (required)
search_symbols query, limit, kind, module
semantic_search_docs query, limit, threshold
semantic_search_with_context query, limit, threshold
get_calls function_name
find_callers function_name
analyze_impact symbol_name, max_depth
get_index_info None

Performance

Parser benchmarks on a 750-symbol test file:

Language Parsing Speed vs. Target (10k/s) Status
Rust 91,318 symbols/sec 9.1x faster ✅ Production
Python 75,047 symbols/sec 7.5x faster ✅ Production
JavaScript - - Coming soon
TypeScript - - Coming soon

Key achievements:

  • Zero-cost abstractions: All parsers use borrowed string slices with no allocations in hot paths
  • Parallel processing: Multi-threaded indexing that scales with CPU cores
  • Memory efficiency: Approximately 100 bytes per symbol including all metadata
  • Real-time capability: Fast enough for incremental parsing during editing
  • Optimized CLI startup: ~300ms for all operations (53x improvement from v0.2)
  • JSON output: Zero overhead - structured output adds <1ms to response time

Run performance benchmarks:

codanna benchmark all          # Test all parsers
codanna benchmark python       # Test specific language

Architecture Highlights

Memory-mapped storage: Two caches for different access patterns:

  • symbol_cache.bin - FNV-1a hashed symbol lookups, <10ms response time
  • segment_0.vec - 384-dimensional vectors, <1μs access after OS page cache warm-up

Embedding lifecycle management: Old embeddings deleted when files are re-indexed to prevent accumulation.

Lock-free concurrency: DashMap for concurrent symbol reads, write coordination via single writer lock.

Single-pass indexing: Symbols, relationships, and embeddings extracted in one AST traversal.

Hot reload: File watcher with 500ms debounce triggers re-indexing of changed files only.

Requirements

  • Rust 1.75+ (for development)
  • ~150MB for model storage (downloaded on first use)
  • A few MB for index storage (varies by codebase size)

Current Limitations

  • Supports Rust and Python (JavaScript, TypeScript coming soon)
  • Semantic search requires English documentation/comments
  • Windows support is experimental

Roadmap

Versioning Strategy

  • 0.2.x - Patches and fixes only (bug fixes, dependency updates, performance improvements)
  • 0.3.x - Feature releases (JSON output, exit codes, new capabilities)
  • 0.4.x - Major features (JavaScript/TypeScript support, advanced analysis)

Status Overview

Priority Feature Status Target
1 JSON Output Support ✅ Completed v0.3.0
2 Exit Codes for Common Conditions ✅ Completed v0.3.0
3 Batch Symbol Operations Planning v0.3.1
4 Output Format Control Planning v0.3.1
5 Direct CLI Semantic Search Partial v0.3.1
6 Incremental Index Updates ✅ Completed v0.2.0
7 Query Language for Complex Searches Partial --
8 Configuration Profiles Pending --
9 Machine-Readable Progress Pending --

1. Direct CLI Semantic Search

Partially Implemented: Simplified syntax available through MCP interface.

# NEW: Simplified syntax (no JSON escaping needed!)
codanna mcp semantic_search_docs query:authentication limit:10 --json

# Still TODO: Direct retrieve command
codanna retrieve semantic "authentication" --limit 10

Delivered:

  • ✅ Simpler command syntax (key:value pairs)
  • ✅ Better Unix integration (positional args)
  • ✅ No JSON escaping needed

Remaining: Direct retrieve semantic command for consistency

2. JSON Output Support

Implemented in v0.3.0: All retrieve commands and MCP tools now support --json flag.

# All retrieve commands support --json
codanna retrieve symbol MyFunction --json
codanna retrieve calls process_file --json
codanna retrieve callers init --json

# All MCP tools support --json
codanna mcp find_symbol main --json
codanna mcp semantic_search_docs query:"error handling" --json

Delivered Benefits:

  • ✅ Stable API for scripts and tools
  • ✅ Zero performance overhead (<1ms)
  • ✅ Consistent JsonResponse format across all commands
  • ✅ Proper exit codes (3 for not found)

3. Batch Symbol Operations

Why: Reduce overhead when analyzing multiple symbols

# Current: Multiple invocations
for sym in func1 func2 func3; do
  codanna retrieve symbol "$sym"
done

# Wishlist: Single command
codanna retrieve symbols func1 func2 func3

Benefits:

  • One index load instead of N
  • Faster CI/CD pipelines
  • Better for parallel analysis

4. Output Format Control

Why: Different use cases need different detail levels

# Compact output for scripts
codanna retrieve callers MyFunc --format=compact
validate_input:src/validation.rs:45
process_request:src/handler.rs:120

# Full output for humans (current default)
codanna retrieve callers MyFunc --format=full

5. Exit Codes for Common Conditions

Implemented in v0.3.0: All commands now return appropriate exit codes.

# Exit codes implemented:
# 0 - Success
# 1 - General error
# 3 - Not found (symbol, function, etc.)

if codanna retrieve symbol MyFunc --json >/dev/null 2>&1; then
  echo "Symbol exists"
else
  if [ $? -eq 3 ]; then
    echo "Symbol not found"
  else
    echo "Error occurred"
  fi
fi

Actual JSON output:

$ codanna retrieve symbol NonExistent --json
{
  "status": "error",
  "code": "NOT_FOUND",
  "message": "Symbol 'NonExistent' not found",
  "error": {
    "suggestions": [
      "Check the spelling",
      "Ensure the index is up to date"
    ]
  },
  "exit_code": 3
}
# Exit code: 3

6. Query Language for Complex Searches

Partially Implemented: Key:value syntax available for MCP tools.

# NOW AVAILABLE: Key:value syntax for MCP tools
codanna mcp search_symbols query:Parser kind:method limit:20 --json
codanna mcp semantic_search_docs query:"error handling" limit:5 --json

# Still TODO: Advanced query combinations
codanna query "kind:method visibility:public calls:*Parser*"
codanna query "kind:function visibility:private callers:0"

Delivered: Basic key:value parameter parsing for MCP tools Remaining: Full query language with wildcards and combinations

7. Incremental Index Updates

Implemented: Watch mode with notification channels for coordinated updates.

# Watch mode auto-indexes changed files
codanna serve --watch --watch-interval 5

# Server output shows notification flow:
# Detected change in indexed file: src/main.rs
#   Re-indexing...
#   ✓ Re-indexed successfully (file updated)
# File watcher received IndexReloaded notification
#   Refreshing watched file list...
#   ✓ Now watching 60 files

Delivered:

  • ✅ Automatic file watching with --watch flag
  • ✅ Broadcast channels coordinate index and file watchers
  • ✅ File deletions trigger index and cache cleanup
  • ✅ Only changed files are re-indexed
  • ✅ Event-driven with debouncing for efficiency

8. Configuration Profiles

Why: Different settings for different use cases

# .codanna/profiles.toml
[profiles.ci]
semantic_search = false
max_file_size = "1MB"

[profiles.dev]
semantic_search = true
watch_mode = true

# Use profile
codanna --profile=ci index .

9. Machine-Readable Progress

Why: Better CI/CD integration

# Current: Human-readable progress
# Wishlist: Machine-readable option
codanna index . --progress=json
{"phase":"parsing","files_done":45,"files_total":200,"percent":22.5}
{"phase":"parsing","files_done":46,"files_total":200,"percent":23.0}

Implementation Priority

  1. JSON output - Enables everything else
  2. Exit codes - Minimal change, big impact
  3. Batch operations - Performance win
  4. Format control - Flexibility for users
  5. Rest - Nice to have

Contributing

This is an early release focused on core functionality. Contributions welcome! See CONTRIBUTING.md for guidelines.

License

Licensed under the Apache License, Version 2.0 - See LICENSE file for details.

Attribution required when using Codanna in your project. See NOTICE file.


Built with 🦀 by developers who wanted their AI assistants to actually understand their code.