Skip to main content

Crate cartog_db

Crate cartog_db 

Source
Expand description

SQLite persistence layer for the cartog code graph.

Stores symbols, edges, and file metadata in a single SQLite database. Provides graph traversal queries (callees, refs, impact, hierarchy), full-text search via FTS5, vector KNN search via sqlite-vec, and a 6-tier heuristic edge resolution algorithm.

§cartog-db

SQLite persistence layer for the cartog code graph.

§Overview

Stores symbols, edges, files, and embeddings in a single SQLite database. Provides query methods for graph traversal (callees, refs, impact, hierarchy), full-text search via FTS5, and vector similarity search via sqlite-vec.

§How it works

§Schema

Four core tables plus four RAG-specific tables:

  • symbols — primary key on stable ID, indexed by name and file_path
  • edgessource_idtarget_name, with target_id resolved later. A resolution_state column tracks lifecycle: 0=unresolved, 1=resolved, 2=unresolvable (LSP definitively gave up: typo, dyn dispatch, macro), 3=external (LSP located the target outside the indexed root: stdlib, deps, node_modules). State 2 and 3 are sticky — skipped on future LSP passes until Database::reset_unresolvable_for_names reopens them on a name match, or Database::reset_all_unresolvable resets all of them on --force
  • files — tracks file hash, language, symbol count, last modified timestamp
  • metadata — key-value store (e.g., last_commit for git-based change detection)
  • symbol_content — raw source code per symbol (for FTS and embedding)
  • symbol_fts — FTS5 virtual table over symbol names and content (BM25 ranking)
  • symbol_embedding_map — maps integer rowids (for sqlite-vec) to symbol IDs
  • symbol_vec — sqlite-vec virtual table with float32 vectors for KNN search. The vector dimension is configurable via .cartog.toml (default: 384). When the configured dimension changes, the vector table is automatically recreated.

§Edge resolution (6-tier heuristic)

When edges are first inserted, target_id is NULL. The resolution algorithm runs in 2 passes, attempting to match target_name to a known symbol:

  1. Same file — exact name match in the same source file
  2. Import path — follow already-resolved import edges
  3. Same directory — match symbols in sibling files
  4. Parent scope — first symbol sharing the source’s parent scope (sibling, LIMIT 1)
  5. Project-wide unique — exactly one match globally
  6. Kind disambiguation — when exactly 2 matches remain, pick the higher-priority kind (type-like class/interface/enum/type_alias/trait > function > method); equal priorities stay unresolved

Tiers 5 and 6 are evaluated together in a single project-wide query (capped at 3 candidates): a lone match resolves via tier 5, exactly two resolve via the tier-6 kind disambiguation, and 3+ stay unresolved.

§Search ranking

Symbol search uses a composite score:

rank = match_tier + kind_penalty
  • match_tier: exact match (0), prefix (1), substring (2)
  • kind_penalty: definitions function/method/class (0), variable and all other kinds (3), import (6)
  • tiebreaker: in_degree DESC (most-referenced symbols first)

§Public API (key exports)

ExportDescription
DatabaseMain handle — open, query, insert, resolve
DB_DIR / DB_FILENAMEDefault DB location: .cartog/db.sqlite at the project root
LEGACY_DB_FILELegacy DB filename (.cartog.db), kept for backwards-compat lookups
MAX_SEARCH_LIMITMaximum results for search queries (100)
UnresolvedEdgeEdge pending LSP resolution
IndexStatsAggregate statistics (files, symbols, edges, languages)
DbError / DbResultCrate error type and result alias
EmbeddingFingerprintRecords the embedding strategy so a format change triggers re-embed
checkpoint_wal()Force a WAL checkpoint (used before backups / handoff)
CURRENT_SCHEMA_VERSION / DEFAULT_EMBEDDING_DIMSchema migration target and default vector dimension

§Crate dependencies

cartog-core

Structs§

Database
EmbeddingFingerprint
Identity of the embedding stack that produced the vectors stored in symbol_vec. Persisted in the metadata table so we can detect when the user swaps provider or model and silently invalidates the existing index even when the dimension happens to stay the same.
IndexStats
PathHop
One hop on a call path returned by Database::trace.
PinnedAttach
Snapshot of write-mode-relevant metadata captured by a read-only attach. Compared against the on-disk values when the reader decides whether it can still safely serve queries against the DB.
SavingsReport
Per-tool query counts + token-savings estimate for cartog stats --savings.
UnresolvedEdge
An unresolved edge from the database (used by LSP resolution).

Enums§

DbError
Typed errors for the database-open and schema-migration paths.
KindScope
Kind scope for Database::fts5_search_kinded, so retrieval can filter by kind in SQL. Mirrors the rag layer’s KindFilter.

Constants§

BUSY_TIMEOUT_MS
Milliseconds a connection waits on a locked database before giving up.
CURRENT_SCHEMA_VERSION
Public mirror of the private SCHEMA_VERSION for callers outside this crate (e.g. cartog pull needs it to compare against a pulled DB and refuse to load a future-versioned file). Kept in sync by construction.
DB_DIR
Default directory for cartog-generated artifacts, at the project root. Holds the SQLite database and its destructive-migration backups.
DB_FILENAME
Default SQLite database filename, stored inside DB_DIR.
DEFAULT_EMBEDDING_DIM
Default embedding dimension (BGE-small-en-v1.5).
LEGACY_DB_FILE
Legacy database filename at the project root, kept for backwards-compatibility lookups. Never written to for new projects: use DB_DIR/DB_FILENAME instead.
MAX_SEARCH_LIMIT
Maximum number of results returned by Database::search. Enforced here and referenced by CLI and MCP layers.
TOKENS_PER_QUERY_CARTOG
Per-query token cost for cartog. Measured: ~280 tokens for a typical navigation query (where is X used?, what does X call?) including the structured response payload.
TOKENS_PER_QUERY_GREP
Per-query token cost for an equivalent grep + read flow. Measured: a grep sweep plus reading the surrounding ~50 lines of each hit averages ~1,700 tokens to answer the same navigation question.
TOKENS_SAVED_PER_QUERY
Per-query token delta (grep − cartog). Coarse on purpose; refining per-tool would require richer per-call accounting and isn’t worth it pre-v1. Sources: benchmarks/queries.rs (see crates/cartog/benches/).

Functions§

checkpoint_wal
Run PRAGMA wal_checkpoint(TRUNCATE) on the SQLite file at path. No-op for missing files. Used before moving the DB to flush the WAL.
normalize_symbol_name
Split a symbol name into lowercase words for FTS5 indexing.
read_metadata_at
Read a single metadata value by key from a cartog SQLite file at path, without the full Database::open machinery. Mirrors read_schema_version_at; used by cartog push/pull to read the last_commit provenance row off a closed DB file.
read_schema_version_at
Read the schema_version recorded in a cartog SQLite file at path, without going through the full Database::open machinery (no migrations, no fingerprint reconciliation). Used by cartog pull to guard against pulling a future-versioned DB before clobbering the local one.
register_sqlite_vec
Register the sqlite-vec extension globally.

Type Aliases§

DbResult
Result alias for the typed-error helpers below.