cartog 0.4.4 - Docs.rs

# Agent Guidelines

## Project

cartog — code graph indexer for LLM coding agents. Single Rust binary, tree-sitter parsing, SQLite storage.

See [docs/product.md](docs/product.md) for product context, [docs/tech.md](docs/tech.md) for architecture decisions, [docs/structure.md](docs/structure.md) for module layout.

## Build & Test

```bash
cargo build              # debug build
cargo build --release    # release build
cargo test               # run all tests (271 tests)
cargo fmt --check        # check formatting
cargo clippy --all-targets -- -D warnings  # lint
```

Always run `cargo fmt --check` and `cargo clippy --all-targets -- -D warnings` before committing.

### Integrity checks

```bash
make check            # all checks (Rust project + fixture codebases)
make check-rust       # cargo fmt + clippy + test
make check-fixtures   # validate all 4 fixture codebases (py, go, rs, rb)
make check-ts         # TypeScript fixtures (requires npx/tsc)
make bench            # shell benchmark suite (13 scenarios x 5 languages)
make bench-criterion  # Rust criterion benchmarks (query latency)
make bench-rag        # RAG relevancy benchmarks (in-memory + shell scenario 13)
```

Run `make check` before committing benchmark fixture changes.

## Code Conventions

- **Error handling**: `anyhow::Result` everywhere, no `unwrap()` in library code.
- **Output**: human-readable by default, `--json` flag for structured output.
- **Visibility**: all public functions get `///` doc comments.
- **Tests**: unit tests co-located in each module (`#[cfg(test)] mod tests`), integration fixtures in `tests/fixtures/`.

## Architecture

```
main.rs → cli.rs (clap) → command handlers (sync)
                         ↓
              indexer.rs (walk + extract + store + symbol content)
              ├── languages/*.rs (tree-sitter extractors)
              ├── db.rs (SQLite: core schema + RAG schema)
              └── types.rs (shared structs)

         → Rag → rag/setup.rs (model download via fastembed auto-download)
              → rag/embeddings.rs (ONNX Runtime inference via fastembed)
              → rag/indexer.rs (embed symbols → sqlite-vec)
              → rag/search.rs (FTS5 + vector KNN → RRF merge)
              → rag/reranker.rs (cross-encoder re-ranking via fastembed)

         → Watch → watch.rs (file watcher, debounced re-index + deferred RAG)
              ├── notify-debouncer-mini (kqueue/inotify/ReadDirectoryChangesW)
              ├── WatchConfig (debounce, rag, rag_delay)
              └── run_watch() / spawn_watch() (foreground / background)

         → Serve → mcp.rs (MCP server over stdio, async via tokio)
              ├── CartogServer (11 tool handlers: 9 core + 2 RAG)
              ├── Path validation (CWD subtree restriction)
              ├── --watch flag → spawn_watch() background thread
              └── spawn_blocking → db.rs / indexer.rs / rag (sync)
```

Each language extractor implements the `Extractor` trait from `src/languages/mod.rs`:
```rust
fn extract(&self, source: &str, file_path: &str) -> Result<ExtractionResult>
```

Returns `Vec<Symbol>` + `Vec<Edge>`. After all files are extracted, `db.resolve_edges()` links edges by name (same file > same dir > unique project match).

## Adding a New Language

1. Add `tree-sitter-{lang}` to `Cargo.toml`
2. Create `src/languages/{lang}.rs` implementing `Extractor`
3. Register in `src/languages/mod.rs`: extension mapping + `get_extractor()` match arm
4. Add tests using the same pattern as `python.rs` tests

## CI/CD

- **CI** (`.github/workflows/ci.yml`): runs on push/PR to `main` — check, fmt, clippy, test, coverage (cargo-llvm-cov → Codecov)
- **Release** (`.github/workflows/release.yml`): runs on tag push (`v*`) — builds binaries for 5 targets, creates GitHub Release, publishes to crates.io
- **Targets**: `x86_64-unknown-linux-gnu`, `aarch64-unknown-linux-gnu`, `x86_64-apple-darwin`, `aarch64-apple-darwin`, `x86_64-pc-windows-msvc`
- **Secrets required**: `CARGO_REGISTRY_TOKEN` (crates.io), `CODECOV_TOKEN` (Codecov)

### Release Process

```bash
./scripts/release.sh patch   # 0.1.0 → 0.1.1
./scripts/release.sh minor   # 0.1.0 → 0.2.0
./scripts/release.sh major   # 0.1.0 → 1.0.0
./scripts/release.sh 2.3.4   # set exact version
```

The script bumps `Cargo.toml`, commits, tags `vX.Y.Z`, and pushes. The release workflow then builds binaries and publishes to crates.io.

## Current State

- **Working**: Python, TypeScript/JavaScript, Rust, Go, Ruby extractors, SQLite storage, all 9 CLI commands + MCP server (`cartog serve`, 11 tools: 9 core + 2 RAG), incremental indexing (git-based + SHA-256 fallback), `--force` re-index flag, CI/CD pipelines, `EdgeKind::References` extraction (type annotations, decorators, exception types, composite literals, `new` expressions, rescue clause types), symbol search (`cartog search`), RAG semantic search (`cartog rag` subcommand group: setup/index/search), hybrid FTS5+vector search with RRF merge, fastembed ONNX Runtime embeddings (`BAAI/bge-small-en-v1.5`), sqlite-vec vector storage, cross-encoder re-ranking (`BAAI/bge-reranker-base` via fastembed, batch scoring, auto-enabled when model downloaded via `cartog rag setup`), shared model cache (`~/.cache/cartog/models`, XDG-compliant), file watcher (`cartog watch` CLI + `cartog serve --watch` background mode, debounced re-index + deferred RAG embedding)
- **Pending**: Java extractor