cartog 0.5.1

Code graph indexer for LLM coding agents. Map your codebase, navigate by graph.
Documentation
# cartog — Project Structure

## Directory Layout

```
cartog/
├── Cargo.toml
├── AGENTS.md                # Guidelines for AI coding agents
├── src/
│   ├── main.rs              # Entry point, CLI dispatch
│   ├── lib.rs               # Library root, re-exports public modules
│   ├── commands.rs          # Command handlers (outline, refs, impact, etc.)
│   ├── cli.rs               # Clap command definitions
│   ├── db.rs                # SQLite schema, CRUD, query methods
│   ├── indexer.rs           # Orchestrates: walk files → extract → store → resolve
│   ├── mcp.rs               # MCP server (tool handlers, path validation, ServerHandler)
│   ├── watch.rs             # File watcher: debounced re-index + deferred RAG embedding
│   ├── languages/
│   │   ├── mod.rs           # Language registry, Extractor trait, shared node_text helper
│   │   ├── python.rs        # Python tree-sitter extractor
│   │   ├── typescript.rs    # TypeScript/TSX extractors
│   │   ├── javascript.rs    # JavaScript extractor
│   │   ├── js_shared.rs     # Shared JS/TS extraction logic
│   │   ├── rust_lang.rs     # Rust extractor
│   │   ├── go.rs            # Go extractor
│   │   ├── ruby.rs          # Ruby extractor
│   │   └── java.rs          # Java extractor
│   ├── rag/
│   │   ├── mod.rs           # RAG module root, constants (EMBEDDING_DIM)
│   │   ├── setup.rs         # Model download (triggers fastembed auto-download)
│   │   ├── embeddings.rs    # ONNX embedding inference via fastembed (BGE-small-en-v1.5)
│   │   ├── indexer.rs       # Embed symbols, store vectors in sqlite-vec
│   │   ├── reranker.rs      # Cross-encoder re-ranking via fastembed (BGE-reranker-base)
│   │   └── search.rs        # FTS5 + vector KNN search, RRF merge, optional re-ranking
│   └── types.rs             # Symbol, Edge, FileInfo structs
├── skills/
│   └── cartog/              # Agent Skill (agentskills.io)
│       ├── SKILL.md         # Behavioral instructions for AI agents
│       ├── scripts/
│       │   ├── install.sh
│       │   ├── ensure_indexed.sh
│       │   └── query.sh
│       ├── tests/
│       │   ├── golden_examples.yaml  # Behavioral test scenarios (expected tool calls)
│       │   ├── test_ensure_indexed.sh # Bash unit tests for ensure_indexed.sh
│       │   └── eval.sh              # LLM-as-judge evaluation via claude CLI
│       └── references/
│           ├── query_cookbook.md
│           └── supported_languages.md
├── benchmarks/
│   ├── run.sh               # Benchmark runner (token efficiency, recall, command count)
│   ├── lib/                 # Shared measurement & comparison helpers
│   ├── fixtures/
│   │   ├── webapp_py/       # Python fixture (69 files)
│   │   ├── webapp_ts/       # TypeScript fixture (48 files)
│   │   ├── webapp_go/       # Go fixture (45 files)
│   │   ├── webapp_rs/       # Rust fixture (65 files)
│   │   ├── webapp_rb/       # Ruby fixture (51 files)
│   │   └── webapp_java/     # Java fixture (41 files)
│   ├── ground_truth/        # Expected relationships per fixture (JSON)
│   ├── scenarios/           # 13 scenario scripts (01-13)
│   └── results/             # Benchmark output (gitignored)
├── tests/
│   ├── rag_relevancy.rs     # RAG relevancy integration benchmark (P@k, R@k, NDCG)
│   └── fixtures/
│       └── auth/            # Python fixtures for indexer tests
│           ├── tokens.py
│           └── service.py
└── docs/
    ├── product.md           # Product overview
    ├── tech.md              # Technology decisions
    ├── structure.md         # This file
    ├── usage.md             # CLI commands + MCP server setup per client
    └── claude-code.md       # Claude Code integration details
```

## Module Responsibilities

- **cli.rs**: Defines all subcommands (including `rag` subgroup and `watch`) via clap derive. No business logic.
- **db.rs**: Owns the SQLite connection. Schema creation (core + RAG tables), inserts, and all query methods. Returns domain types. RAG additions: `symbol_content` (source text), `symbol_fts` (FTS5 index), `symbol_vec` (sqlite-vec vectors), `symbol_embedding_map` (integer ID mapping).
- **indexer.rs**: Walks the file tree, delegates to language extractors, writes to db, runs edge resolution. Also stores symbol source content for RAG during indexing. Exports `is_ignored_dirname()` for reuse by the watcher.
- **commands.rs**: Command handlers for all CLI commands including `rag setup/index/search` and `watch`. Formats output (human-readable or `--json`).
- **mcp.rs**: MCP server over stdio. `CartogServer` struct with 11 `#[tool]` handlers (9 core + 2 RAG). Path validation restricts `index` to CWD subtree. Uses `spawn_blocking` for sync DB/indexer calls. Optionally spawns a background file watcher (`--watch` flag).
- **watch.rs**: File watcher using `notify-debouncer-mini`. Debounces filesystem events, triggers incremental `index_directory()`. Optionally defers RAG embedding after a configurable delay. Used standalone (`cartog watch`) or embedded in MCP server (`cartog serve --watch`).
- **languages/mod.rs**: Maps file extensions to extractors, defines the `Extractor` trait and shared `node_text` helper. Each extractor implements `fn extract(&self, source: &str, file_path: &str) -> Result<ExtractionResult>`.
- **rag/mod.rs**: RAG pipeline constants (`EMBEDDING_DIM = 384`), shared model cache directory (`model_cache_dir()` — XDG-compliant, avoids per-project model downloads).
- **rag/setup.rs**: Triggers model download by instantiating fastembed engines (models auto-downloaded from HuggingFace on first use).
- **rag/embeddings.rs**: ONNX Runtime inference via fastembed (`BAAI/bge-small-en-v1.5`). Serialization helpers for sqlite-vec byte format.
- **rag/indexer.rs**: Embeds all symbols with content, stores in sqlite-vec. Supports incremental (skip existing) and force modes.
- **rag/search.rs**: Hybrid search combining FTS5 keyword (BM25) + vector KNN (cosine), merged via Reciprocal Rank Fusion (RRF, k=60). Optional cross-encoder re-ranking when model is available.
- **rag/reranker.rs**: Cross-encoder re-ranking via fastembed (`BAAI/bge-reranker-base`). Scores (query, document) pairs jointly. Auto-enabled when model is downloadable.
- **types.rs**: Shared data structures. No logic beyond Display/serialization.

## Conventions

- No `unwrap()` in library code — use `anyhow::Result` for error propagation.
- All public functions documented with `///` doc comments.
- CLI output: human-readable by default, `--json` for structured output.
- Tests: unit tests in each module (`#[cfg(test)]`), fixture files in `tests/fixtures/` for indexer tests.