Expand description
SQLite persistence layer for the cartog code graph.
Stores symbols, edges, and file metadata in a single SQLite database. Provides graph traversal queries (callees, refs, impact, hierarchy), full-text search via FTS5, vector KNN search via sqlite-vec, and a 6-tier heuristic edge resolution algorithm.
§cartog-db
SQLite persistence layer for the cartog code graph.
§Overview
Stores symbols, edges, files, and embeddings in a single SQLite database. Provides query methods for graph traversal (callees, refs, impact, hierarchy), full-text search via FTS5, and vector similarity search via sqlite-vec.
§How it works
§Schema
Four core tables plus four RAG-specific tables:
symbols— primary key on stable ID, indexed bynameandfile_pathedges—source_id→target_name, withtarget_idresolved later. Aresolution_statecolumn tracks lifecycle:0=unresolved,1=resolved,2=unresolvable(LSP definitively gave up: typo, dyn dispatch, macro),3=external(LSP located the target outside the indexed root: stdlib, deps, node_modules). State 2 and 3 are sticky — skipped on future LSP passes untilDatabase::reset_unresolvable_for_namesreopens them on a name match, orDatabase::reset_all_unresolvableresets all of them on--forcefiles— tracks file hash, language, symbol count, last modified timestampmetadata— key-value store (e.g.,last_commitfor git-based change detection)symbol_content— raw source code per symbol (for FTS and embedding)symbol_fts— FTS5 virtual table over symbol names and content (BM25 ranking)symbol_embedding_map— maps integer rowids (for sqlite-vec) to symbol IDssymbol_vec— sqlite-vec virtual table with float32 vectors for KNN search. The vector dimension is configurable via.cartog.toml(default: 384). When the configured dimension changes, the vector table is automatically recreated.
§Edge resolution (6-tier heuristic)
When edges are first inserted, target_id is NULL. The resolution algorithm runs in 2 passes, attempting to match target_name to a known symbol:
- Same file — exact name match in the same source file
- Import path — follow already-resolved import edges
- Same directory — match symbols in sibling files
- Parent scope — first symbol sharing the source’s parent scope (sibling,
LIMIT 1) - Project-wide unique — exactly one match globally
- Kind disambiguation — when exactly 2 matches remain, pick the higher-priority kind (type-like
class/interface/enum/type_alias/trait>function>method); equal priorities stay unresolved
Tiers 5 and 6 are evaluated together in a single project-wide query (capped at 3 candidates): a lone match resolves via tier 5, exactly two resolve via the tier-6 kind disambiguation, and 3+ stay unresolved.
§Search ranking
Symbol search uses a composite score:
rank = match_tier + kind_penalty- match_tier: exact match (0), prefix (1), substring (2)
- kind_penalty: definitions
function/method/class(0),variableand all other kinds (3),import(6) - tiebreaker:
in_degree DESC(most-referenced symbols first)
§Public API (key exports)
| Export | Description |
|---|---|
Database | Main handle — open, query, insert, resolve |
DB_DIR / DB_FILENAME | Default DB location: .cartog/db.sqlite at the project root |
LEGACY_DB_FILE | Legacy DB filename (.cartog.db), kept for backwards-compat lookups |
MAX_SEARCH_LIMIT | Maximum results for search queries (100) |
UnresolvedEdge | Edge pending LSP resolution |
IndexStats | Aggregate statistics (files, symbols, edges, languages) |
DbError / DbResult | Crate error type and result alias |
EmbeddingFingerprint | Records the embedding strategy so a format change triggers re-embed |
checkpoint_wal() | Force a WAL checkpoint (used before backups / handoff) |
CURRENT_SCHEMA_VERSION / DEFAULT_EMBEDDING_DIM | Schema migration target and default vector dimension |
§Crate dependencies
cartog-core
Structs§
- Database
- Embedding
Fingerprint - Identity of the embedding stack that produced the vectors stored in
symbol_vec. Persisted in themetadatatable so we can detect when the user swaps provider or model and silently invalidates the existing index even when the dimension happens to stay the same. - Index
Stats - PathHop
- One hop on a call path returned by
Database::trace. - Pinned
Attach - Snapshot of write-mode-relevant metadata captured by a read-only attach. Compared against the on-disk values when the reader decides whether it can still safely serve queries against the DB.
- Savings
Report - Per-tool query counts + token-savings estimate for
cartog stats --savings. - Unresolved
Edge - An unresolved edge from the database (used by LSP resolution).
Enums§
- DbError
- Typed errors for the database-open and schema-migration paths.
- Kind
Scope - Kind scope for
Database::fts5_search_kinded, so retrieval can filter by kind in SQL. Mirrors the rag layer’sKindFilter.
Constants§
- BUSY_
TIMEOUT_ MS - Milliseconds a connection waits on a locked database before giving up.
- CURRENT_
SCHEMA_ VERSION - Public mirror of the private
SCHEMA_VERSIONfor callers outside this crate (e.g.cartog pullneeds it to compare against a pulled DB and refuse to load a future-versioned file). Kept in sync by construction. - DB_DIR
- Default directory for cartog-generated artifacts, at the project root. Holds the SQLite database and its destructive-migration backups.
- DB_
FILENAME - Default SQLite database filename, stored inside
DB_DIR. - DEFAULT_
EMBEDDING_ DIM - Default embedding dimension (BGE-small-en-v1.5).
- LEGACY_
DB_ FILE - Legacy database filename at the project root, kept for backwards-compatibility
lookups. Never written to for new projects: use
DB_DIR/DB_FILENAMEinstead. - MAX_
SEARCH_ LIMIT - Maximum number of results returned by
Database::search. Enforced here and referenced by CLI and MCP layers. - TOKENS_
PER_ QUERY_ CARTOG - Per-query token cost for cartog. Measured: ~280 tokens for a typical
navigation query (
where is X used?,what does X call?) including the structured response payload. - TOKENS_
PER_ QUERY_ GREP - Per-query token cost for an equivalent grep + read flow. Measured: a grep sweep plus reading the surrounding ~50 lines of each hit averages ~1,700 tokens to answer the same navigation question.
- TOKENS_
SAVED_ PER_ QUERY - Per-query token delta (
grep − cartog). Coarse on purpose; refining per-tool would require richer per-call accounting and isn’t worth it pre-v1. Sources: benchmarks/queries.rs (seecrates/cartog/benches/).
Functions§
- checkpoint_
wal - Run
PRAGMA wal_checkpoint(TRUNCATE)on the SQLite file atpath. No-op for missing files. Used before moving the DB to flush the WAL. - normalize_
symbol_ name - Split a symbol name into lowercase words for FTS5 indexing.
- read_
metadata_ at - Read a single
metadatavalue by key from a cartog SQLite file atpath, without the fullDatabase::openmachinery. Mirrorsread_schema_version_at; used bycartog push/pullto read thelast_commitprovenance row off a closed DB file. - read_
schema_ version_ at - Read the
schema_versionrecorded in a cartog SQLite file atpath, without going through the fullDatabase::openmachinery (no migrations, no fingerprint reconciliation). Used bycartog pullto guard against pulling a future-versioned DB before clobbering the local one. - register_
sqlite_ vec - Register the sqlite-vec extension globally.
Type Aliases§
- DbResult
- Result alias for the typed-error helpers below.