pub struct Store { /* private fields */ }Expand description
Thread-safe SQLite store for chunks and embeddings
Uses sqlx connection pooling for concurrent reads and WAL mode for crash safety. All methods are synchronous but internally use an async runtime to execute sqlx operations.
§Memory-mapped I/O
open() sets PRAGMA mmap_size = 256MB per connection with a 4-connection pool,
reserving up to 1GB of virtual address space. open_readonly() uses 64MB × 1.
This is intentional and benign on 64-bit systems (128TB virtual address space).
Mmap pages are demand-paged from the database file and evicted under memory
pressure — actual RSS reflects only accessed pages, not the mmap reservation.
§Example
use cqs::Store;
use std::path::Path;
let store = Store::open(Path::new(".cqs/index.db"))?;
let stats = store.stats()?;
println!("Indexed {} chunks", stats.total_chunks);Implementations§
Source§impl Store
impl Store
Sourcepub fn upsert_calls(
&self,
chunk_id: &str,
calls: &[CallSite],
) -> Result<(), StoreError>
pub fn upsert_calls( &self, chunk_id: &str, calls: &[CallSite], ) -> Result<(), StoreError>
Insert or replace call sites for a chunk
Sourcepub fn upsert_calls_batch(
&self,
calls: &[(String, CallSite)],
) -> Result<(), StoreError>
pub fn upsert_calls_batch( &self, calls: &[(String, CallSite)], ) -> Result<(), StoreError>
Insert call sites for multiple chunks in a single transaction.
Takes (chunk_id, CallSite) pairs and batches them into one transaction.
Sourcepub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>
pub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>
Get all function names called by a given chunk.
Takes a chunk ID (unique) rather than a name. Returns only callee names (not full chunks) because:
- Callees may not exist in the index (external functions)
- Callers typically chain:
get_callees→get_callers_fullfor graph traversal
For richer callee data, see [get_callers_with_context].
Sourcepub fn call_stats(&self) -> Result<CallStats, StoreError>
pub fn call_stats(&self) -> Result<CallStats, StoreError>
Retrieves aggregated statistics about function calls from the database.
Queries the calls table to obtain the total number of calls and the count of distinct callees, returning this information as a CallStats structure.
§Arguments
&self- A reference to the store instance containing the database connection pool and async runtime.
§Returns
Returns a Result containing:
Ok(CallStats)- A struct withtotal_calls(total number of recorded calls) andunique_callees(number of distinct functions called).Err(StoreError)- If the database query fails.
§Errors
Returns StoreError if the SQL query execution fails or if database connectivity issues occur.
Sourcepub fn upsert_function_calls(
&self,
file: &Path,
function_calls: &[FunctionCalls],
) -> Result<(), StoreError>
pub fn upsert_function_calls( &self, file: &Path, function_calls: &[FunctionCalls], ) -> Result<(), StoreError>
Insert function calls for a file (full call graph, no size limits)
Source§impl Store
impl Store
Sourcepub fn find_dead_code(
&self,
include_pub: bool,
) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>
pub fn find_dead_code( &self, include_pub: bool, ) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>
Find functions/methods never called by indexed code (dead code detection).
Returns two lists:
confident: Functions with no callers that are likely dead (with confidence scores)possibly_dead_pub: Public functions with no callers (may be used externally)
Uses two-phase query: lightweight metadata first, then content only for candidates that pass name/test/path filters (avoids loading large function bodies).
Exclusions applied:
- Entry point names (
main,init,handler, etc.) - Test functions (via
find_test_chunks()heuristics) - Functions in test files
- Trait implementations (dynamic dispatch invisible to call graph)
#[no_mangle]functions (FFI)
Confidence scoring:
- High: Private function in a file where no other function has callers
- Medium: Private function in an active file (other functions are called)
- Low: Method, or function with constructor-like name patterns
Source§impl Store
impl Store
Sourcepub fn get_callers_full(
&self,
callee_name: &str,
) -> Result<Vec<CallerInfo>, StoreError>
pub fn get_callers_full( &self, callee_name: &str, ) -> Result<Vec<CallerInfo>, StoreError>
Find all callers of a function (from full call graph)
Sourcepub fn get_callees_full(
&self,
caller_name: &str,
file: Option<&str>,
) -> Result<Vec<(String, u32)>, StoreError>
pub fn get_callees_full( &self, caller_name: &str, file: Option<&str>, ) -> Result<Vec<(String, u32)>, StoreError>
Get all callees of a function (from full call graph)
When file is provided, scopes to callees of that function in that specific file.
When None, returns callees across all files (backwards compatible, but ambiguous
for common names like new, parse, from_str).
Sourcepub fn get_call_graph(&self) -> Result<Arc<CallGraph>, StoreError>
pub fn get_call_graph(&self) -> Result<Arc<CallGraph>, StoreError>
Load the call graph as forward + reverse adjacency lists.
Single SQL scan of function_calls, capped at 500K edges to prevent OOM
on adversarial databases. Typical projects have ~2000 edges.
Used by trace (forward BFS), impact (reverse BFS), and test-map (reverse BFS).
Cached call graph — populated on first access, returns clone from OnceLock.
No invalidation by design. The cache lives for the Store lifetime and is
never cleared. Normal usage is one Store per CLI command, so the index cannot
change while the cache is live. In long-lived modes (batch, watch), callers must
re-open the Store to pick up index changes — do not add a clear() here.
~15 call sites benefit from this single-scan caching.
Sourcepub fn get_callers_with_context(
&self,
callee_name: &str,
) -> Result<Vec<CallerWithContext>, StoreError>
pub fn get_callers_with_context( &self, callee_name: &str, ) -> Result<Vec<CallerWithContext>, StoreError>
Find callers with call-site line numbers for impact analysis.
Returns the caller function name, file, start line, and the specific line
where the call to callee_name occurs.
Sourcepub fn get_callers_with_context_batch(
&self,
callee_names: &[&str],
) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>
pub fn get_callers_with_context_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>
Batch-fetch callers with context for multiple callee names.
Returns callee_name -> Vec<CallerWithContext> using a single
WHERE callee_name IN (...) query per batch of 500 names.
Avoids N+1 get_callers_with_context calls in diff impact analysis.
Sourcepub fn get_callers_full_batch(
&self,
callee_names: &[&str],
) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>
pub fn get_callers_full_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>
Batch-fetch callers (full call graph) for multiple callee names.
Returns callee_name -> Vec<CallerInfo> using a single
WHERE callee_name IN (...) query per batch of 500 names.
Avoids N+1 get_callers_full calls in the context command.
Sourcepub fn get_callees_full_batch(
&self,
caller_names: &[&str],
) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>
pub fn get_callees_full_batch( &self, caller_names: &[&str], ) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>
Batch-fetch callees (full call graph) for multiple caller names.
Returns caller_name -> Vec<(callee_name, call_line)> using a single
WHERE caller_name IN (...) query per batch of 500 names.
Avoids N+1 get_callees_full calls in the context command.
Unlike [get_callees_full], does not support file scoping — returns
callees across all files. This is acceptable for the context command
which later filters by origin.
Source§impl Store
impl Store
Sourcepub fn get_caller_counts_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, u64>, StoreError>
pub fn get_caller_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>
Caller counts for multiple functions in one query.
Returns how many callers each function has. Functions not in the call graph won’t appear in the result map (caller count is implicitly 0).
Sourcepub fn get_callee_counts_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, u64>, StoreError>
pub fn get_callee_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>
Callee counts for multiple functions in one query.
Returns how many callees each function has. Functions not in the call graph won’t appear in the result map (callee count is implicitly 0).
Functions that share callers with target (called by the same functions).
For target X, finds functions Y where some function A calls both X and Y. Returns (function_name, overlap_count) sorted by overlap descending.
Functions that share callees with target (call the same functions).
For target X, finds functions Y where X and Y both call some function C. Returns (function_name, overlap_count) sorted by overlap descending.
Sourcepub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>
pub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>
Get full call graph statistics
Sourcepub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>
pub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>
Count distinct callers for each callee name.
Returns (callee_name, distinct_caller_count) pairs. Used by the
enrichment pass for IDF-style filtering: callees called by many
distinct callers are likely utilities (log, unwrap, etc.).
Source§impl Store
impl Store
Sourcepub fn prune_stale_calls(&self) -> Result<u64, StoreError>
pub fn prune_stale_calls(&self) -> Result<u64, StoreError>
Delete function_calls for files no longer in the chunks table.
Used by GC to clean up orphaned call graph entries after pruning chunks.
Sourcepub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>
pub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>
Find test chunks using language-specific heuristics.
Identifies test functions across all supported languages by:
- Name patterns:
test_*(Rust/Python),Test*(Go) - Content patterns: sourced from
LanguageDef::test_markersper language - Path patterns: sourced from
LanguageDef::test_path_patternsper language
Uses a broad SQL filter then Rust post-filter for precision.
Cached test chunks — populated on first access, returns clone from OnceLock.
No invalidation by design. Same contract as get_call_graph: the cache is
intentionally write-once for the Store lifetime. Long-lived modes (batch, watch)
must re-open the Store to see updated test discovery — do not add a clear().
~14 call sites benefit from this single-scan caching.
Source§impl Store
impl Store
Sourcepub fn embedding_batches(
&self,
batch_size: usize,
) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_
pub fn embedding_batches( &self, batch_size: usize, ) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_
Stream embeddings in batches for memory-efficient HNSW building.
Uses cursor-based pagination (WHERE rowid > last_seen) for stability under concurrent writes. LIMIT/OFFSET can skip or duplicate rows if the table is modified between batches.
§Arguments
batch_size- Number of embeddings per batch (recommend 10_000)
§Returns
Iterator yielding Result<Vec<(String, Embedding)>, StoreError>
§Panics
Must be called from sync context only. This iterator internally uses
block_on() which will panic if called from within an async runtime.
This is used for HNSW building which runs in dedicated sync threads.
Source§impl Store
impl Store
Sourcepub fn get_metadata(&self, key: &str) -> Result<String, StoreError>
pub fn get_metadata(&self, key: &str) -> Result<String, StoreError>
Retrieve a single metadata value by key.
Returns Ok(value) if the key exists, or Err if not found or on DB error.
Used for lightweight metadata checks (e.g., model compatibility between stores).
Sourcepub fn upsert_chunks_batch(
&self,
chunks: &[(Chunk, Embedding)],
source_mtime: Option<i64>,
) -> Result<usize, StoreError>
pub fn upsert_chunks_batch( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, ) -> Result<usize, StoreError>
Insert or update chunks in batch using multi-row INSERT.
Chunks are inserted in batches of 52 rows (52 * 19 params = 988 < SQLite’s 999 limit). FTS operations remain per-row because FTS5 doesn’t support INSERT OR REPLACE.
DS-19 warning: Uses INSERT OR REPLACE which triggers ON DELETE CASCADE on
calls and type_edges tables. Callers must re-populate call graph edges after
this function if the chunks had existing relationships.
Sourcepub fn upsert_chunk(
&self,
chunk: &Chunk,
embedding: &Embedding,
source_mtime: Option<i64>,
) -> Result<(), StoreError>
pub fn upsert_chunk( &self, chunk: &Chunk, embedding: &Embedding, source_mtime: Option<i64>, ) -> Result<(), StoreError>
Insert or update a single chunk
Sourcepub fn update_embeddings_batch(
&self,
updates: &[(String, Embedding)],
) -> Result<usize, StoreError>
pub fn update_embeddings_batch( &self, updates: &[(String, Embedding)], ) -> Result<usize, StoreError>
Update only the embedding for existing chunks by chunk ID.
updates is a slice of (chunk_id, embedding) pairs. Chunk IDs not
found in the store are logged and skipped (rows_affected == 0).
Returns the count of actually updated rows.
Update embeddings in batch (without changing enrichment hashes).
Convenience wrapper around update_embeddings_with_hashes_batch that
passes None for the enrichment hash, leaving it unchanged.
Sourcepub fn update_embeddings_with_hashes_batch(
&self,
updates: &[(String, Embedding, Option<String>)],
) -> Result<usize, StoreError>
pub fn update_embeddings_with_hashes_batch( &self, updates: &[(String, Embedding, Option<String>)], ) -> Result<usize, StoreError>
Update embeddings and optionally enrichment hashes in batch.
When the hash is Some, stores the enrichment hash for idempotency
detection. When None, leaves the existing enrichment hash unchanged.
Used by the enrichment pass to record which call context was used,
so re-indexing can skip unchanged chunks.
Sourcepub fn get_enrichment_hashes_batch(
&self,
chunk_ids: &[&str],
) -> Result<HashMap<String, String>, StoreError>
pub fn get_enrichment_hashes_batch( &self, chunk_ids: &[&str], ) -> Result<HashMap<String, String>, StoreError>
Get enrichment hashes for a batch of chunk IDs.
Returns a map from chunk_id to enrichment_hash (only for chunks that have one).
Sourcepub fn get_all_enrichment_hashes(
&self,
) -> Result<HashMap<String, String>, StoreError>
pub fn get_all_enrichment_hashes( &self, ) -> Result<HashMap<String, String>, StoreError>
Fetch all enrichment hashes in a single query.
Returns a map from chunk_id to enrichment_hash for all chunks that have one. Used by the enrichment pass to avoid per-page hash fetches (PERF-29).
Sourcepub fn get_summaries_by_hashes(
&self,
content_hashes: &[&str],
purpose: &str,
) -> Result<HashMap<String, String>, StoreError>
pub fn get_summaries_by_hashes( &self, content_hashes: &[&str], purpose: &str, ) -> Result<HashMap<String, String>, StoreError>
Get LLM summaries for a batch of content hashes.
Returns a map from content_hash to summary text. Only includes hashes that have summaries in the llm_summaries table matching the given purpose.
Sourcepub fn upsert_summaries_batch(
&self,
summaries: &[(String, String, String, String)],
) -> Result<usize, StoreError>
pub fn upsert_summaries_batch( &self, summaries: &[(String, String, String, String)], ) -> Result<usize, StoreError>
Insert or update LLM summaries in batch.
Each entry is (content_hash, summary, model, purpose).
Sourcepub fn get_all_summaries(
&self,
purpose: &str,
) -> Result<HashMap<String, String>, StoreError>
pub fn get_all_summaries( &self, purpose: &str, ) -> Result<HashMap<String, String>, StoreError>
Fetch all LLM summaries as a map from content_hash to summary text.
Single query, no batching needed (reads entire table). Used by the enrichment pass to avoid per-page summary fetches.
Sourcepub fn get_all_content_hashes(&self) -> Result<Vec<String>, StoreError>
pub fn get_all_content_hashes(&self) -> Result<Vec<String>, StoreError>
Get all distinct content hashes currently in the chunks table. Used to validate batch results against the current index (DS-20).
Sourcepub fn prune_orphan_summaries(&self) -> Result<usize, StoreError>
pub fn prune_orphan_summaries(&self) -> Result<usize, StoreError>
Delete orphan LLM summaries whose content_hash doesn’t exist in any chunk.
Sourcepub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>
pub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>
Check if a file needs reindexing based on mtime.
Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime),
or Ok(None) if no reindex needed. This avoids reading file metadata twice.
Sourcepub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>
pub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>
Delete all chunks for an origin (file path or source identifier)
Sourcepub fn upsert_chunks_and_calls(
&self,
chunks: &[(Chunk, Embedding)],
source_mtime: Option<i64>,
calls: &[(String, CallSite)],
) -> Result<usize, StoreError>
pub fn upsert_chunks_and_calls( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, calls: &[(String, CallSite)], ) -> Result<usize, StoreError>
Atomically upsert chunks and their call graph in a single transaction.
Combines chunk upsert (with FTS) and call graph upsert into one transaction, preventing inconsistency from crashes between separate operations. Chunks are inserted in batches of 52 rows (52 * 19 = 988 < SQLite’s 999 limit).
Sourcepub fn delete_phantom_chunks(
&self,
file: &Path,
live_ids: &[&str],
) -> Result<u32, StoreError>
pub fn delete_phantom_chunks( &self, file: &Path, live_ids: &[&str], ) -> Result<u32, StoreError>
Delete chunks for a file that are no longer in the current parse output (RT-DATA-10).
After re-parsing a file, some functions may have been removed. Their old
chunks would linger as phantoms. This deletes chunks whose origin matches
file but whose ID is not in live_ids.
Source§impl Store
impl Store
Sourcepub fn get_embeddings_by_hashes(
&self,
hashes: &[&str],
) -> Result<HashMap<String, Embedding>, StoreError>
pub fn get_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>
Get embeddings for chunks with matching content hashes (batch lookup).
Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Sourcepub fn get_chunk_ids_and_embeddings_by_hashes(
&self,
hashes: &[&str],
) -> Result<Vec<(String, Embedding)>, StoreError>
pub fn get_chunk_ids_and_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<Vec<(String, Embedding)>, StoreError>
Get (chunk_id, embedding) pairs for chunks with matching content hashes.
Unlike get_embeddings_by_hashes (which keys by content_hash), this returns
the chunk ID alongside the embedding — exactly what HNSW insert_batch needs.
Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Source§impl Store
impl Store
Sourcepub fn chunk_count(&self) -> Result<u64, StoreError>
pub fn chunk_count(&self) -> Result<u64, StoreError>
Get the number of chunks in the index
Sourcepub fn stats(&self) -> Result<IndexStats, StoreError>
pub fn stats(&self) -> Result<IndexStats, StoreError>
Get index statistics
Uses batched queries to minimize database round trips:
- Single query for counts with GROUP BY using CTEs
- Single query for all metadata keys
Sourcepub fn get_chunks_by_origin(
&self,
origin: &str,
) -> Result<Vec<ChunkSummary>, StoreError>
pub fn get_chunks_by_origin( &self, origin: &str, ) -> Result<Vec<ChunkSummary>, StoreError>
Get all chunks for a given file (origin).
Returns chunks sorted by line_start. Used by cqs context to list
all functions/types in a file.
Sourcepub fn get_chunks_by_origins_batch(
&self,
origins: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_chunks_by_origins_batch( &self, origins: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch chunks by multiple origin paths.
Returns a map of origin -> Veccqs where to avoid N+1 get_chunks_by_origin calls.
Sourcepub fn get_chunks_by_names_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_chunks_by_names_batch( &self, names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch chunks by multiple function names.
Returns a map of name -> Veccqs related to avoid N+1 get_chunks_by_name calls.
Sourcepub fn get_chunk_with_embedding(
&self,
id: &str,
) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>
pub fn get_chunk_with_embedding( &self, id: &str, ) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>
Batch signature search: find function/method chunks matching any of the given type names.
Get a chunk with its embedding vector.
Returns Ok(None) if the chunk doesn’t exist or has a corrupt embedding.
Used by cqs similar and cqs explain to search by example.
Sourcepub fn get_chunks_by_ids(
&self,
ids: &[&str],
) -> Result<HashMap<String, ChunkSummary>, StoreError>
pub fn get_chunks_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, ChunkSummary>, StoreError>
Batch-fetch chunks by IDs.
Returns a map of chunk ID → ChunkSummary for all found IDs.
Used by --expand to fetch parent chunks for small-to-big retrieval.
Sourcepub fn get_embeddings_by_ids(
&self,
ids: &[&str],
) -> Result<HashMap<String, Embedding>, StoreError>
pub fn get_embeddings_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>
Batch-fetch embeddings by chunk IDs.
Returns a map of chunk ID → Embedding for all found IDs. Skips chunks with corrupt embeddings. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Used by semantic_diff to avoid N+1 queries when comparing matched pairs.
Sourcepub fn search_by_names_batch(
&self,
names: &[&str],
limit_per_name: usize,
) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>
pub fn search_by_names_batch( &self, names: &[&str], limit_per_name: usize, ) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>
Batch name search: look up multiple names in a single call.
For each name, returns up to limit_per_name matching chunks.
Batches names into groups of 20 and issues a combined FTS OR query
per batch, then post-filters results to assign to matching names.
Used by gather BFS expansion to avoid N+1 query patterns.
Sourcepub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>
pub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>
Get identity metadata for all chunks (for diff comparison).
Returns minimal metadata needed to match chunks across stores. Loads all rows but only lightweight columns (no content or embeddings).
Sourcepub fn chunks_paged(
&self,
after_rowid: i64,
limit: usize,
) -> Result<(Vec<ChunkSummary>, i64), StoreError>
pub fn chunks_paged( &self, after_rowid: i64, limit: usize, ) -> Result<(Vec<ChunkSummary>, i64), StoreError>
Fetch a page of full chunks by rowid cursor.
Returns (chunks, next_cursor). When the returned vec is empty, iteration
is complete. Used by the enrichment pass to iterate all chunks without
loading everything into memory.
Sourcepub fn all_chunk_identities_filtered(
&self,
language: Option<&str>,
) -> Result<Vec<ChunkIdentity>, StoreError>
pub fn all_chunk_identities_filtered( &self, language: Option<&str>, ) -> Result<Vec<ChunkIdentity>, StoreError>
Like all_chunk_identities but with an optional language filter.
When language is Some, only chunks matching that language are returned,
avoiding loading all chunks into memory when only one language is needed.
Source§impl Store
impl Store
Sourcepub fn prune_missing(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<u32, StoreError>
pub fn prune_missing( &self, existing_files: &HashSet<PathBuf>, ) -> Result<u32, StoreError>
Delete chunks for files that no longer exist
Batches deletes in groups of 100 to balance memory usage and query efficiency.
Uses Rust HashSet for existence check rather than SQL WHERE NOT IN because:
- Existing files often number 10k+, exceeding SQLite’s parameter limit (~999)
- Sending full file list to SQLite would require chunked queries anyway
- HashSet lookup is O(1), and we already have the set from enumerate_files()
Sourcepub fn prune_all(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<PruneAllResult, StoreError>
pub fn prune_all( &self, existing_files: &HashSet<PathBuf>, ) -> Result<PruneAllResult, StoreError>
Run all prune operations in a single SQLite transaction.
Ensures concurrent readers never see an inconsistent state where chunks
are deleted but orphan call graph / type edge / summary entries remain.
Without this, the window between prune_missing and prune_stale_calls
exposes stale function_calls rows referencing deleted chunks.
Sourcepub fn count_stale_files(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<(u64, u64), StoreError>
pub fn count_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<(u64, u64), StoreError>
Count files that are stale (mtime changed) or missing from disk.
Compares stored source_mtime against current filesystem state. Only checks files with source_type=‘file’ (not notes or other sources).
Returns (stale_count, missing_count).
Sourcepub fn list_stale_files(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<StaleReport, StoreError>
pub fn list_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<StaleReport, StoreError>
List files that are stale (mtime changed) or missing from disk.
Like count_stale_files() but returns full details for display.
Requires existing_files from enumerate_files() (~100ms for 10k files).
Sourcepub fn check_origins_stale(
&self,
origins: &[&str],
root: &Path,
) -> Result<HashSet<String>, StoreError>
pub fn check_origins_stale( &self, origins: &[&str], root: &Path, ) -> Result<HashSet<String>, StoreError>
Check if specific origins are stale (mtime changed on disk).
Lightweight per-query check: only examines the given origins, not the entire index. O(result_count), not O(index_size).
root is the project root — origins are relative paths joined against it.
Returns the set of stale origin paths.
Source§impl Store
impl Store
Sourcepub fn stored_model_name(&self) -> Option<String>
pub fn stored_model_name(&self) -> Option<String>
Read the stored model name from metadata, if set.
Returns None for fresh databases or pre-model indexes.
Sourcepub fn touch_updated_at(&self) -> Result<(), StoreError>
pub fn touch_updated_at(&self) -> Result<(), StoreError>
Update the updated_at metadata timestamp to now.
Call after indexing operations complete (pipeline, watch reindex, note sync) to track when the index was last modified.
Sourcepub fn set_hnsw_dirty(&self, dirty: bool) -> Result<(), StoreError>
pub fn set_hnsw_dirty(&self, dirty: bool) -> Result<(), StoreError>
Mark the HNSW index as dirty (out of sync with SQLite).
Call before writing chunks to SQLite. Clear after successful HNSW save. On load, a dirty flag means a crash occurred between SQLite commit and HNSW save — the HNSW index should not be trusted.
Sourcepub fn is_hnsw_dirty(&self) -> Result<bool, StoreError>
pub fn is_hnsw_dirty(&self) -> Result<bool, StoreError>
Check if the HNSW index is marked as dirty (potentially stale).
Returns false if the key doesn’t exist (pre-v13 indexes).
Sourcepub fn set_pending_batch_id(
&self,
batch_id: Option<&str>,
) -> Result<(), StoreError>
pub fn set_pending_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>
Store a pending LLM batch ID so interrupted processes can resume polling.
Sourcepub fn get_pending_batch_id(&self) -> Result<Option<String>, StoreError>
pub fn get_pending_batch_id(&self) -> Result<Option<String>, StoreError>
Get the pending LLM batch ID, if any.
Sourcepub fn set_pending_doc_batch_id(
&self,
batch_id: Option<&str>,
) -> Result<(), StoreError>
pub fn set_pending_doc_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>
Store a pending doc-comment batch ID so interrupted processes can resume polling.
Sourcepub fn get_pending_doc_batch_id(&self) -> Result<Option<String>, StoreError>
pub fn get_pending_doc_batch_id(&self) -> Result<Option<String>, StoreError>
Get the pending doc-comment batch ID, if any.
Sourcepub fn set_pending_hyde_batch_id(
&self,
batch_id: Option<&str>,
) -> Result<(), StoreError>
pub fn set_pending_hyde_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>
Store a pending HyDE batch ID so interrupted processes can resume polling.
Sourcepub fn get_pending_hyde_batch_id(&self) -> Result<Option<String>, StoreError>
pub fn get_pending_hyde_batch_id(&self) -> Result<Option<String>, StoreError>
Get the pending HyDE batch ID, if any.
Sourcepub fn cached_notes_summaries(
&self,
) -> Result<Arc<Vec<NoteSummary>>, StoreError>
pub fn cached_notes_summaries( &self, ) -> Result<Arc<Vec<NoteSummary>>, StoreError>
Get cached notes summaries (loaded on first call, invalidated on mutation).
Returns Arc<Vec<NoteSummary>> — the warm-cache path is an Arc::clone()
(pointer bump) instead of deep-cloning all note strings. Notes are read-only
during search, so shared ownership is safe and avoids O(notes * string_len)
cloning on every search call.
Source§impl Store
impl Store
Sourcepub fn upsert_notes_batch(
&self,
notes: &[Note],
source_file: &Path,
file_mtime: i64,
) -> Result<usize, StoreError>
pub fn upsert_notes_batch( &self, notes: &[Note], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>
Insert or update notes in batch
Sourcepub fn replace_notes_for_file(
&self,
notes: &[Note],
source_file: &Path,
file_mtime: i64,
) -> Result<usize, StoreError>
pub fn replace_notes_for_file( &self, notes: &[Note], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>
Replace all notes for a source file in a single transaction.
Atomically deletes existing notes and inserts new ones, preventing data loss if the process crashes mid-operation.
Sourcepub fn notes_need_reindex(
&self,
source_file: &Path,
) -> Result<Option<i64>, StoreError>
pub fn notes_need_reindex( &self, source_file: &Path, ) -> Result<Option<i64>, StoreError>
Check if notes file needs reindexing based on mtime.
Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime),
or Ok(None) if no reindex needed. This avoids reading file metadata twice.
Sourcepub fn note_count(&self) -> Result<u64, StoreError>
pub fn note_count(&self) -> Result<u64, StoreError>
Retrieves the total count of notes stored in the database.
This method executes a SQL COUNT query against the notes table and returns the total number of notes. If no notes exist, it returns 0.
§Returns
Returns a Result containing the count of notes as a u64, or a StoreError if the database query fails.
§Errors
Returns StoreError if the database query encounters an error or the connection fails.
Sourcepub fn note_stats(&self) -> Result<NoteStats, StoreError>
pub fn note_stats(&self) -> Result<NoteStats, StoreError>
Get note statistics (total, warnings, patterns).
Uses SENTIMENT_NEGATIVE_THRESHOLD (-0.3) and SENTIMENT_POSITIVE_THRESHOLD (0.3)
to classify notes. These thresholds work with discrete sentiment values
(-1, -0.5, 0, 0.5, 1) – negative values (-1, -0.5) count as warnings,
positive values (0.5, 1) count as patterns.
Sourcepub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
pub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
List all notes with metadata (no embeddings).
Returns NoteSummary for each note, useful for mention-based filtering
without the cost of loading embeddings.
Source§impl Store
impl Store
Sourcepub fn search_fts(
&self,
query: &str,
limit: usize,
) -> Result<Vec<String>, StoreError>
pub fn search_fts( &self, query: &str, limit: usize, ) -> Result<Vec<String>, StoreError>
Search FTS5 index for keyword matches.
§Search Method Overview
The Store provides several search methods with different characteristics:
-
search_fts: Full-text keyword search using SQLite FTS5. Returns chunk IDs. Best for: Exact keyword matches, symbol lookup by name fragment. -
search_by_name: Definition search by function/struct name. Uses FTS5 with heavy weighting on the name column. Returns fullSearchResultwith scores. Best for: “Where is X defined?” queries. -
search_filtered(in search.rs): Semantic search with optional language/path filters. Can use RRF hybrid search combining semantic + FTS scores. Best for: Natural language queries like “retry with exponential backoff”. -
search_filtered_with_index(in search.rs): Likesearch_filteredbut uses HNSW/CAGRA vector index for O(log n) candidate retrieval instead of brute force. Best for: Large indexes (>5k chunks) where brute force is slow.
Sourcepub fn search_by_name(
&self,
name: &str,
limit: usize,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_by_name( &self, name: &str, limit: usize, ) -> Result<Vec<SearchResult>, StoreError>
Search for chunks by name (definition search).
Searches the FTS5 name column for exact or prefix matches. Use this for “where is X defined?” queries instead of semantic search.
Source§impl Store
impl Store
Sourcepub fn upsert_type_edges(
&self,
chunk_id: &str,
type_refs: &[TypeRef],
) -> Result<(), StoreError>
pub fn upsert_type_edges( &self, chunk_id: &str, type_refs: &[TypeRef], ) -> Result<(), StoreError>
Upsert type edges for a single chunk.
Deletes existing type edges for the chunk, then batch-inserts new ones. 4 binds per row → 249 rows per batch (996 < 999 SQLite limit).
Sourcepub fn upsert_type_edges_for_file(
&self,
file: &Path,
chunk_type_refs: &[ChunkTypeRefs],
) -> Result<(), StoreError>
pub fn upsert_type_edges_for_file( &self, file: &Path, chunk_type_refs: &[ChunkTypeRefs], ) -> Result<(), StoreError>
Upsert type edges for all chunks in a file.
Resolves chunk names to chunk IDs via the chunks table, then deletes old type edges and batch-inserts new ones. Chunks not found in the database are warned and skipped (not an error).
For windowed chunks, associates type edges with the first window (window_idx IS NULL or window_idx = 0).
Sourcepub fn upsert_type_edges_for_files(
&self,
file_edges: &[(PathBuf, Vec<ChunkTypeRefs>)],
) -> Result<(), StoreError>
pub fn upsert_type_edges_for_files( &self, file_edges: &[(PathBuf, Vec<ChunkTypeRefs>)], ) -> Result<(), StoreError>
Upsert type edges for multiple files in a single transaction.
Batches all per-file work (chunk ID resolution, delete, insert) into one transaction instead of one transaction per file. Falls back per-file on individual resolution failures (warns and skips unresolved chunks).
Sourcepub fn get_type_users(
&self,
type_name: &str,
) -> Result<Vec<ChunkSummary>, StoreError>
pub fn get_type_users( &self, type_name: &str, ) -> Result<Vec<ChunkSummary>, StoreError>
Get chunks that reference a given type name.
Forward query: “who uses Config?” Returns chunks that have type edges pointing to the given type name.
Sourcepub fn get_types_used_by(
&self,
chunk_name: &str,
) -> Result<Vec<TypeUsage>, StoreError>
pub fn get_types_used_by( &self, chunk_name: &str, ) -> Result<Vec<TypeUsage>, StoreError>
Get types used by a given chunk (by function name).
Reverse query: “what types does parse_config use?” Returns TypeUsage structs
where edge_kind is “” for catch-all types.
Sourcepub fn get_type_users_batch(
&self,
type_names: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_type_users_batch( &self, type_names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch type users for multiple type names.
Returns type_name -> Vec
Sourcepub fn get_types_used_by_batch(
&self,
chunk_names: &[&str],
) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>
pub fn get_types_used_by_batch( &self, chunk_names: &[&str], ) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>
Batch-fetch types used by multiple chunk names.
Returns chunk_name -> Vec<(type_name, edge_kind)>. Uses WHERE IN with 200 names per batch.
Sourcepub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>
pub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>
Retrieves statistics about type edges in the store.
Queries the database to obtain the total count of type edges and the number of distinct target type names, then returns these metrics as a TypeEdgeStats struct.
§Returns
A Result containing TypeEdgeStats with the total number of edges and count of unique types, or a StoreError if the database query fails.
§Errors
Returns StoreError if the database query cannot be executed or the connection fails.
Sourcepub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>
pub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>
Load the type graph as forward + reverse adjacency lists.
Single SQL scan of type_edges joined with chunks, capped at 500K edges.
Forward: chunk_name -> Vec<type_name>, Reverse: type_name -> Vec<chunk_name>.
Find types that share users with target (co-occurrence).
“Types commonly used alongside Config” → Vec<(type_name, overlap_count)>. Uses self-join: find other types referenced by the same chunks that reference target.
Sourcepub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>
pub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>
Delete type_edges for chunks no longer in the chunks table (GC).
Returns the number of pruned rows.
Source§impl Store
impl Store
Sourcepub fn set_dim(&mut self, dim: usize)
pub fn set_dim(&mut self, dim: usize)
Update the embedding dimension after init (fresh DB only).
Store::open defaults to EMBEDDING_DIM when the metadata table doesn’t
exist yet. After init() writes the correct dim, call this to sync.
Sourcepub fn open(path: &Path) -> Result<Self, StoreError>
pub fn open(path: &Path) -> Result<Self, StoreError>
Open an existing index with connection pooling
Sourcepub fn open_light(path: &Path) -> Result<Self, StoreError>
pub fn open_light(path: &Path) -> Result<Self, StoreError>
Open an existing index with single-threaded runtime but full memory.
Uses current_thread tokio runtime (1 OS thread instead of 4) while
keeping the full 256MB mmap and 16MB cache of open(). Ideal for
read-only CLI commands on the primary project index where we need
full search performance but don’t need multi-threaded async.
Sourcepub fn open_readonly(path: &Path) -> Result<Self, StoreError>
pub fn open_readonly(path: &Path) -> Result<Self, StoreError>
Open an existing index in read-only mode with reduced resources.
Uses minimal connection pool, smaller cache, and single-threaded runtime. Suitable for reference stores and background builds that only read data.
Sourcepub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>
pub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>
Create a new index
Wraps all DDL and metadata inserts in a single transaction so a crash mid-init cannot leave a partial schema.
Sourcepub fn close(self) -> Result<(), StoreError>
pub fn close(self) -> Result<(), StoreError>
Gracefully close the store, performing WAL checkpoint.
This ensures all WAL changes are written to the main database file, reducing startup time for subsequent opens and freeing disk space used by WAL files.
Safe to skip (pool will close connections on drop), but recommended for clean shutdown in long-running processes.
Source§impl Store
impl Store
Sourcepub fn search_embedding_only(
&self,
query: &Embedding,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_embedding_only( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Raw embedding-only cosine similarity search (no RRF, no keyword matching).
You almost certainly want search_filtered() instead. This method skips
hybrid RRF ranking, name boosting, and all filters. It exists for tests and
internal building blocks only. Two production bugs came from calling this
directly (PR #305).
Sourcepub fn search_filtered(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_filtered( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Searches for embeddings matching a query with optional filtering and ranking.
§Arguments
query- The embedding vector to search forfilter- Search filter configuration including path patterns, RRF settings, and demotion ruleslimit- Maximum number of results to returnthreshold- Minimum similarity score threshold for results
§Returns
A vector of search results ranked by relevance, containing up to limit entries that exceed the similarity threshold.
§Errors
Returns StorageError if loading cached note summaries fails or if the underlying search operation encounters a storage error.
Sourcepub fn search_filtered_with_index(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
index: Option<&dyn VectorIndex>,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_filtered_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<SearchResult>, StoreError>
Search with optional vector index for O(log n) candidate retrieval
Sourcepub fn search_by_candidate_ids(
&self,
candidate_ids: &[&str],
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_by_candidate_ids( &self, candidate_ids: &[&str], query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Search within a set of candidate IDs (for HNSW-guided filtered search)
Sourcepub fn search_unified_with_index(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
index: Option<&dyn VectorIndex>,
) -> Result<Vec<UnifiedResult>, StoreError>
pub fn search_unified_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<UnifiedResult>, StoreError>
Unified search with optional vector index.
Returns code-only results (SQ-9: notes removed from search pipeline). When an HNSW index is provided, uses O(log n) candidate retrieval.
Trait Implementations§
Source§impl Drop for Store
impl Drop for Store
Source§fn drop(&mut self)
fn drop(&mut self)
Performs a best-effort WAL (Write-Ahead Logging) checkpoint when the Store is dropped to prevent accumulation of large WAL files.
§Arguments
&mut self- A mutable reference to the Store instance being dropped
§Returns
Nothing. Errors during checkpoint are logged as warnings but not propagated, as Drop implementations cannot fail.
§Panics
Does not panic. Uses catch_unwind to safely handle potential panics from block_on when called from within an async context (e.g., dropping Store inside a tokio runtime).
Auto Trait Implementations§
impl !Freeze for Store
impl !RefUnwindSafe for Store
impl Send for Store
impl Sync for Store
impl Unpin for Store
impl UnsafeUnpin for Store
impl !UnwindSafe for Store
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more