Skip to main content

Store

Struct Store 

Source
pub struct Store { /* private fields */ }
Expand description

Thread-safe SQLite store for chunks and embeddings

Uses sqlx connection pooling for concurrent reads and WAL mode for crash safety. All methods are synchronous but internally use an async runtime to execute sqlx operations.

§Memory-mapped I/O

open() sets PRAGMA mmap_size = 256MB per connection with a 4-connection pool, reserving up to 1GB of virtual address space. open_readonly() uses 64MB × 1. This is intentional and benign on 64-bit systems (128TB virtual address space). Mmap pages are demand-paged from the database file and evicted under memory pressure — actual RSS reflects only accessed pages, not the mmap reservation.

§Example

use cqs::Store;
use std::path::Path;

let store = Store::open(Path::new(".cqs/index.db"))?;
let stats = store.stats()?;
println!("Indexed {} chunks", stats.total_chunks);

Implementations§

Source§

impl Store

Source

pub fn upsert_calls( &self, chunk_id: &str, calls: &[CallSite], ) -> Result<(), StoreError>

Insert or replace call sites for a chunk

Source

pub fn upsert_calls_batch( &self, calls: &[(String, CallSite)], ) -> Result<(), StoreError>

Insert call sites for multiple chunks in a single transaction.

Takes (chunk_id, CallSite) pairs and batches them into one transaction.

Source

pub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>

Get all function names called by a given chunk.

Takes a chunk ID (unique) rather than a name. Returns only callee names (not full chunks) because:

  • Callees may not exist in the index (external functions)
  • Callers typically chain: get_calleesget_callers_full for graph traversal

For richer callee data, see [get_callers_with_context].

Source

pub fn call_stats(&self) -> Result<CallStats, StoreError>

Retrieves aggregated statistics about function calls from the database.

Queries the calls table to obtain the total number of calls and the count of distinct callees, returning this information as a CallStats structure.

§Arguments
  • &self - A reference to the store instance containing the database connection pool and async runtime.
§Returns

Returns a Result containing:

  • Ok(CallStats) - A struct with total_calls (total number of recorded calls) and unique_callees (number of distinct functions called).
  • Err(StoreError) - If the database query fails.
§Errors

Returns StoreError if the SQL query execution fails or if database connectivity issues occur.

Source

pub fn upsert_function_calls( &self, file: &Path, function_calls: &[FunctionCalls], ) -> Result<(), StoreError>

Insert function calls for a file (full call graph, no size limits)

Source§

impl Store

Source

pub fn find_dead_code( &self, include_pub: bool, ) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>

Find functions/methods never called by indexed code (dead code detection).

Returns two lists:

  • confident: Functions with no callers that are likely dead (with confidence scores)
  • possibly_dead_pub: Public functions with no callers (may be used externally)

Uses two-phase query: lightweight metadata first, then content only for candidates that pass name/test/path filters (avoids loading large function bodies).

Exclusions applied:

  • Entry point names (main, init, handler, etc.)
  • Test functions (via find_test_chunks() heuristics)
  • Functions in test files
  • Trait implementations (dynamic dispatch invisible to call graph)
  • #[no_mangle] functions (FFI)

Confidence scoring:

  • High: Private function in a file where no other function has callers
  • Medium: Private function in an active file (other functions are called)
  • Low: Method, or function with constructor-like name patterns
Source§

impl Store

Source

pub fn get_callers_full( &self, callee_name: &str, ) -> Result<Vec<CallerInfo>, StoreError>

Find all callers of a function (from full call graph)

Source

pub fn get_callees_full( &self, caller_name: &str, file: Option<&str>, ) -> Result<Vec<(String, u32)>, StoreError>

Get all callees of a function (from full call graph)

When file is provided, scopes to callees of that function in that specific file. When None, returns callees across all files (backwards compatible, but ambiguous for common names like new, parse, from_str).

Source

pub fn get_call_graph(&self) -> Result<Arc<CallGraph>, StoreError>

Load the call graph as forward + reverse adjacency lists.

Single SQL scan of function_calls, capped at 500K edges to prevent OOM on adversarial databases. Typical projects have ~2000 edges. Used by trace (forward BFS), impact (reverse BFS), and test-map (reverse BFS).

Cached call graph — populated on first access, returns clone from OnceLock.

No invalidation by design. The cache lives for the Store lifetime and is never cleared. Normal usage is one Store per CLI command, so the index cannot change while the cache is live. In long-lived modes (batch, watch), callers must re-open the Store to pick up index changes — do not add a clear() here. ~15 call sites benefit from this single-scan caching.

Source

pub fn get_callers_with_context( &self, callee_name: &str, ) -> Result<Vec<CallerWithContext>, StoreError>

Find callers with call-site line numbers for impact analysis.

Returns the caller function name, file, start line, and the specific line where the call to callee_name occurs.

Source

pub fn get_callers_with_context_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>

Batch-fetch callers with context for multiple callee names.

Returns callee_name -> Vec<CallerWithContext> using a single WHERE callee_name IN (...) query per batch of 500 names. Avoids N+1 get_callers_with_context calls in diff impact analysis.

Source

pub fn get_callers_full_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>

Batch-fetch callers (full call graph) for multiple callee names.

Returns callee_name -> Vec<CallerInfo> using a single WHERE callee_name IN (...) query per batch of 500 names. Avoids N+1 get_callers_full calls in the context command.

Source

pub fn get_callees_full_batch( &self, caller_names: &[&str], ) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>

Batch-fetch callees (full call graph) for multiple caller names.

Returns caller_name -> Vec<(callee_name, call_line)> using a single WHERE caller_name IN (...) query per batch of 500 names. Avoids N+1 get_callees_full calls in the context command.

Unlike [get_callees_full], does not support file scoping — returns callees across all files. This is acceptable for the context command which later filters by origin.

Source§

impl Store

Source

pub fn get_caller_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

Caller counts for multiple functions in one query.

Returns how many callers each function has. Functions not in the call graph won’t appear in the result map (caller count is implicitly 0).

Source

pub fn get_callee_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

Callee counts for multiple functions in one query.

Returns how many callees each function has. Functions not in the call graph won’t appear in the result map (callee count is implicitly 0).

Source

pub fn find_shared_callers( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Functions that share callers with target (called by the same functions).

For target X, finds functions Y where some function A calls both X and Y. Returns (function_name, overlap_count) sorted by overlap descending.

Source

pub fn find_shared_callees( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Functions that share callees with target (call the same functions).

For target X, finds functions Y where X and Y both call some function C. Returns (function_name, overlap_count) sorted by overlap descending.

Source

pub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>

Get full call graph statistics

Source

pub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>

Count distinct callers for each callee name.

Returns (callee_name, distinct_caller_count) pairs. Used by the enrichment pass for IDF-style filtering: callees called by many distinct callers are likely utilities (log, unwrap, etc.).

Source§

impl Store

Source

pub fn prune_stale_calls(&self) -> Result<u64, StoreError>

Delete function_calls for files no longer in the chunks table.

Used by GC to clean up orphaned call graph entries after pruning chunks.

Source

pub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>

Find test chunks using language-specific heuristics.

Identifies test functions across all supported languages by:

  • Name patterns: test_* (Rust/Python), Test* (Go)
  • Content patterns: sourced from LanguageDef::test_markers per language
  • Path patterns: sourced from LanguageDef::test_path_patterns per language

Uses a broad SQL filter then Rust post-filter for precision.

Cached test chunks — populated on first access, returns clone from OnceLock.

No invalidation by design. Same contract as get_call_graph: the cache is intentionally write-once for the Store lifetime. Long-lived modes (batch, watch) must re-open the Store to see updated test discovery — do not add a clear(). ~14 call sites benefit from this single-scan caching.

Source§

impl Store

Source

pub fn embedding_batches( &self, batch_size: usize, ) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_

Stream embeddings in batches for memory-efficient HNSW building.

Uses cursor-based pagination (WHERE rowid > last_seen) for stability under concurrent writes. LIMIT/OFFSET can skip or duplicate rows if the table is modified between batches.

§Arguments
  • batch_size - Number of embeddings per batch (recommend 10_000)
§Returns

Iterator yielding Result<Vec<(String, Embedding)>, StoreError>

§Panics

Must be called from sync context only. This iterator internally uses block_on() which will panic if called from within an async runtime. This is used for HNSW building which runs in dedicated sync threads.

Source§

impl Store

Source

pub fn get_metadata(&self, key: &str) -> Result<String, StoreError>

Retrieve a single metadata value by key.

Returns Ok(value) if the key exists, or Err if not found or on DB error. Used for lightweight metadata checks (e.g., model compatibility between stores).

Source

pub fn upsert_chunks_batch( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, ) -> Result<usize, StoreError>

Insert or update chunks in batch using multi-row INSERT.

Chunks are inserted in batches of 52 rows (52 * 19 params = 988 < SQLite’s 999 limit). FTS operations remain per-row because FTS5 doesn’t support INSERT OR REPLACE.

DS-19 warning: Uses INSERT OR REPLACE which triggers ON DELETE CASCADE on calls and type_edges tables. Callers must re-populate call graph edges after this function if the chunks had existing relationships.

Source

pub fn upsert_chunk( &self, chunk: &Chunk, embedding: &Embedding, source_mtime: Option<i64>, ) -> Result<(), StoreError>

Insert or update a single chunk

Source

pub fn update_embeddings_batch( &self, updates: &[(String, Embedding)], ) -> Result<usize, StoreError>

Update only the embedding for existing chunks by chunk ID.

updates is a slice of (chunk_id, embedding) pairs. Chunk IDs not found in the store are logged and skipped (rows_affected == 0). Returns the count of actually updated rows.

Update embeddings in batch (without changing enrichment hashes).

Convenience wrapper around update_embeddings_with_hashes_batch that passes None for the enrichment hash, leaving it unchanged.

Source

pub fn update_embeddings_with_hashes_batch( &self, updates: &[(String, Embedding, Option<String>)], ) -> Result<usize, StoreError>

Update embeddings and optionally enrichment hashes in batch.

When the hash is Some, stores the enrichment hash for idempotency detection. When None, leaves the existing enrichment hash unchanged. Used by the enrichment pass to record which call context was used, so re-indexing can skip unchanged chunks.

Source

pub fn get_enrichment_hashes_batch( &self, chunk_ids: &[&str], ) -> Result<HashMap<String, String>, StoreError>

Get enrichment hashes for a batch of chunk IDs.

Returns a map from chunk_id to enrichment_hash (only for chunks that have one).

Source

pub fn get_all_enrichment_hashes( &self, ) -> Result<HashMap<String, String>, StoreError>

Fetch all enrichment hashes in a single query.

Returns a map from chunk_id to enrichment_hash for all chunks that have one. Used by the enrichment pass to avoid per-page hash fetches (PERF-29).

Source

pub fn get_summaries_by_hashes( &self, content_hashes: &[&str], purpose: &str, ) -> Result<HashMap<String, String>, StoreError>

Get LLM summaries for a batch of content hashes.

Returns a map from content_hash to summary text. Only includes hashes that have summaries in the llm_summaries table matching the given purpose.

Source

pub fn upsert_summaries_batch( &self, summaries: &[(String, String, String, String)], ) -> Result<usize, StoreError>

Insert or update LLM summaries in batch.

Each entry is (content_hash, summary, model, purpose).

Source

pub fn get_all_summaries( &self, purpose: &str, ) -> Result<HashMap<String, String>, StoreError>

Fetch all LLM summaries as a map from content_hash to summary text.

Single query, no batching needed (reads entire table). Used by the enrichment pass to avoid per-page summary fetches.

Source

pub fn get_all_content_hashes(&self) -> Result<Vec<String>, StoreError>

Get all distinct content hashes currently in the chunks table. Used to validate batch results against the current index (DS-20).

Source

pub fn prune_orphan_summaries(&self) -> Result<usize, StoreError>

Delete orphan LLM summaries whose content_hash doesn’t exist in any chunk.

Source

pub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>

Check if a file needs reindexing based on mtime.

Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime), or Ok(None) if no reindex needed. This avoids reading file metadata twice.

Source

pub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>

Delete all chunks for an origin (file path or source identifier)

Source

pub fn upsert_chunks_and_calls( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, calls: &[(String, CallSite)], ) -> Result<usize, StoreError>

Atomically upsert chunks and their call graph in a single transaction.

Combines chunk upsert (with FTS) and call graph upsert into one transaction, preventing inconsistency from crashes between separate operations. Chunks are inserted in batches of 52 rows (52 * 19 = 988 < SQLite’s 999 limit).

Source

pub fn delete_phantom_chunks( &self, file: &Path, live_ids: &[&str], ) -> Result<u32, StoreError>

Delete chunks for a file that are no longer in the current parse output (RT-DATA-10).

After re-parsing a file, some functions may have been removed. Their old chunks would linger as phantoms. This deletes chunks whose origin matches file but whose ID is not in live_ids.

Source§

impl Store

Source

pub fn get_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

Get embeddings for chunks with matching content hashes (batch lookup).

Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Source

pub fn get_chunk_ids_and_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<Vec<(String, Embedding)>, StoreError>

Get (chunk_id, embedding) pairs for chunks with matching content hashes.

Unlike get_embeddings_by_hashes (which keys by content_hash), this returns the chunk ID alongside the embedding — exactly what HNSW insert_batch needs.

Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Source§

impl Store

Source

pub fn chunk_count(&self) -> Result<u64, StoreError>

Get the number of chunks in the index

Source

pub fn stats(&self) -> Result<IndexStats, StoreError>

Get index statistics

Uses batched queries to minimize database round trips:

  1. Single query for counts with GROUP BY using CTEs
  2. Single query for all metadata keys
Source

pub fn get_chunks_by_origin( &self, origin: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

Get all chunks for a given file (origin).

Returns chunks sorted by line_start. Used by cqs context to list all functions/types in a file.

Source

pub fn get_chunks_by_origins_batch( &self, origins: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch chunks by multiple origin paths.

Returns a map of origin -> Vec for all found origins. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999). Used by cqs where to avoid N+1 get_chunks_by_origin calls.

Source

pub fn get_chunks_by_names_batch( &self, names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch chunks by multiple function names.

Returns a map of name -> Vec for all found names. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999). Used by cqs related to avoid N+1 get_chunks_by_name calls.

Source

pub fn get_chunk_with_embedding( &self, id: &str, ) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>

Batch signature search: find function/method chunks matching any of the given type names.

Get a chunk with its embedding vector.

Returns Ok(None) if the chunk doesn’t exist or has a corrupt embedding. Used by cqs similar and cqs explain to search by example.

Source

pub fn get_chunks_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, ChunkSummary>, StoreError>

Batch-fetch chunks by IDs.

Returns a map of chunk ID → ChunkSummary for all found IDs. Used by --expand to fetch parent chunks for small-to-big retrieval.

Source

pub fn get_embeddings_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

Batch-fetch embeddings by chunk IDs.

Returns a map of chunk ID → Embedding for all found IDs. Skips chunks with corrupt embeddings. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Used by semantic_diff to avoid N+1 queries when comparing matched pairs.

Source

pub fn search_by_names_batch( &self, names: &[&str], limit_per_name: usize, ) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>

Batch name search: look up multiple names in a single call.

For each name, returns up to limit_per_name matching chunks. Batches names into groups of 20 and issues a combined FTS OR query per batch, then post-filters results to assign to matching names.

Used by gather BFS expansion to avoid N+1 query patterns.

Source

pub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>

Get identity metadata for all chunks (for diff comparison).

Returns minimal metadata needed to match chunks across stores. Loads all rows but only lightweight columns (no content or embeddings).

Source

pub fn chunks_paged( &self, after_rowid: i64, limit: usize, ) -> Result<(Vec<ChunkSummary>, i64), StoreError>

Fetch a page of full chunks by rowid cursor.

Returns (chunks, next_cursor). When the returned vec is empty, iteration is complete. Used by the enrichment pass to iterate all chunks without loading everything into memory.

Source

pub fn all_chunk_identities_filtered( &self, language: Option<&str>, ) -> Result<Vec<ChunkIdentity>, StoreError>

Like all_chunk_identities but with an optional language filter.

When language is Some, only chunks matching that language are returned, avoiding loading all chunks into memory when only one language is needed.

Source§

impl Store

Source

pub fn prune_missing( &self, existing_files: &HashSet<PathBuf>, ) -> Result<u32, StoreError>

Delete chunks for files that no longer exist

Batches deletes in groups of 100 to balance memory usage and query efficiency.

Uses Rust HashSet for existence check rather than SQL WHERE NOT IN because:

  • Existing files often number 10k+, exceeding SQLite’s parameter limit (~999)
  • Sending full file list to SQLite would require chunked queries anyway
  • HashSet lookup is O(1), and we already have the set from enumerate_files()
Source

pub fn prune_all( &self, existing_files: &HashSet<PathBuf>, ) -> Result<PruneAllResult, StoreError>

Run all prune operations in a single SQLite transaction.

Ensures concurrent readers never see an inconsistent state where chunks are deleted but orphan call graph / type edge / summary entries remain. Without this, the window between prune_missing and prune_stale_calls exposes stale function_calls rows referencing deleted chunks.

Source

pub fn count_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<(u64, u64), StoreError>

Count files that are stale (mtime changed) or missing from disk.

Compares stored source_mtime against current filesystem state. Only checks files with source_type=‘file’ (not notes or other sources).

Returns (stale_count, missing_count).

Source

pub fn list_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<StaleReport, StoreError>

List files that are stale (mtime changed) or missing from disk.

Like count_stale_files() but returns full details for display. Requires existing_files from enumerate_files() (~100ms for 10k files).

Source

pub fn check_origins_stale( &self, origins: &[&str], root: &Path, ) -> Result<HashSet<String>, StoreError>

Check if specific origins are stale (mtime changed on disk).

Lightweight per-query check: only examines the given origins, not the entire index. O(result_count), not O(index_size).

root is the project root — origins are relative paths joined against it.

Returns the set of stale origin paths.

Source§

impl Store

Source

pub fn stored_model_name(&self) -> Option<String>

Read the stored model name from metadata, if set.

Returns None for fresh databases or pre-model indexes.

Source

pub fn touch_updated_at(&self) -> Result<(), StoreError>

Update the updated_at metadata timestamp to now.

Call after indexing operations complete (pipeline, watch reindex, note sync) to track when the index was last modified.

Source

pub fn set_hnsw_dirty(&self, dirty: bool) -> Result<(), StoreError>

Mark the HNSW index as dirty (out of sync with SQLite).

Call before writing chunks to SQLite. Clear after successful HNSW save. On load, a dirty flag means a crash occurred between SQLite commit and HNSW save — the HNSW index should not be trusted.

Source

pub fn is_hnsw_dirty(&self) -> Result<bool, StoreError>

Check if the HNSW index is marked as dirty (potentially stale).

Returns false if the key doesn’t exist (pre-v13 indexes).

Source

pub fn set_pending_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>

Store a pending LLM batch ID so interrupted processes can resume polling.

Source

pub fn get_pending_batch_id(&self) -> Result<Option<String>, StoreError>

Get the pending LLM batch ID, if any.

Source

pub fn set_pending_doc_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>

Store a pending doc-comment batch ID so interrupted processes can resume polling.

Source

pub fn get_pending_doc_batch_id(&self) -> Result<Option<String>, StoreError>

Get the pending doc-comment batch ID, if any.

Source

pub fn set_pending_hyde_batch_id( &self, batch_id: Option<&str>, ) -> Result<(), StoreError>

Store a pending HyDE batch ID so interrupted processes can resume polling.

Source

pub fn get_pending_hyde_batch_id(&self) -> Result<Option<String>, StoreError>

Get the pending HyDE batch ID, if any.

Source

pub fn cached_notes_summaries( &self, ) -> Result<Arc<Vec<NoteSummary>>, StoreError>

Get cached notes summaries (loaded on first call, invalidated on mutation).

Returns Arc<Vec<NoteSummary>> — the warm-cache path is an Arc::clone() (pointer bump) instead of deep-cloning all note strings. Notes are read-only during search, so shared ownership is safe and avoids O(notes * string_len) cloning on every search call.

Source§

impl Store

Source

pub fn upsert_notes_batch( &self, notes: &[Note], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

Insert or update notes in batch

Source

pub fn replace_notes_for_file( &self, notes: &[Note], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

Replace all notes for a source file in a single transaction.

Atomically deletes existing notes and inserts new ones, preventing data loss if the process crashes mid-operation.

Source

pub fn notes_need_reindex( &self, source_file: &Path, ) -> Result<Option<i64>, StoreError>

Check if notes file needs reindexing based on mtime.

Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime), or Ok(None) if no reindex needed. This avoids reading file metadata twice.

Source

pub fn note_count(&self) -> Result<u64, StoreError>

Retrieves the total count of notes stored in the database.

This method executes a SQL COUNT query against the notes table and returns the total number of notes. If no notes exist, it returns 0.

§Returns

Returns a Result containing the count of notes as a u64, or a StoreError if the database query fails.

§Errors

Returns StoreError if the database query encounters an error or the connection fails.

Source

pub fn note_stats(&self) -> Result<NoteStats, StoreError>

Get note statistics (total, warnings, patterns).

Uses SENTIMENT_NEGATIVE_THRESHOLD (-0.3) and SENTIMENT_POSITIVE_THRESHOLD (0.3) to classify notes. These thresholds work with discrete sentiment values (-1, -0.5, 0, 0.5, 1) – negative values (-1, -0.5) count as warnings, positive values (0.5, 1) count as patterns.

Source

pub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>

List all notes with metadata (no embeddings).

Returns NoteSummary for each note, useful for mention-based filtering without the cost of loading embeddings.

Source§

impl Store

Source

pub fn search_fts( &self, query: &str, limit: usize, ) -> Result<Vec<String>, StoreError>

Search FTS5 index for keyword matches.

§Search Method Overview

The Store provides several search methods with different characteristics:

  • search_fts: Full-text keyword search using SQLite FTS5. Returns chunk IDs. Best for: Exact keyword matches, symbol lookup by name fragment.

  • search_by_name: Definition search by function/struct name. Uses FTS5 with heavy weighting on the name column. Returns full SearchResult with scores. Best for: “Where is X defined?” queries.

  • search_filtered (in search.rs): Semantic search with optional language/path filters. Can use RRF hybrid search combining semantic + FTS scores. Best for: Natural language queries like “retry with exponential backoff”.

  • search_filtered_with_index (in search.rs): Like search_filtered but uses HNSW/CAGRA vector index for O(log n) candidate retrieval instead of brute force. Best for: Large indexes (>5k chunks) where brute force is slow.

Source

pub fn search_by_name( &self, name: &str, limit: usize, ) -> Result<Vec<SearchResult>, StoreError>

Search for chunks by name (definition search).

Searches the FTS5 name column for exact or prefix matches. Use this for “where is X defined?” queries instead of semantic search.

Source§

impl Store

Source

pub fn upsert_type_edges( &self, chunk_id: &str, type_refs: &[TypeRef], ) -> Result<(), StoreError>

Upsert type edges for a single chunk.

Deletes existing type edges for the chunk, then batch-inserts new ones. 4 binds per row → 249 rows per batch (996 < 999 SQLite limit).

Source

pub fn upsert_type_edges_for_file( &self, file: &Path, chunk_type_refs: &[ChunkTypeRefs], ) -> Result<(), StoreError>

Upsert type edges for all chunks in a file.

Resolves chunk names to chunk IDs via the chunks table, then deletes old type edges and batch-inserts new ones. Chunks not found in the database are warned and skipped (not an error).

For windowed chunks, associates type edges with the first window (window_idx IS NULL or window_idx = 0).

Source

pub fn upsert_type_edges_for_files( &self, file_edges: &[(PathBuf, Vec<ChunkTypeRefs>)], ) -> Result<(), StoreError>

Upsert type edges for multiple files in a single transaction.

Batches all per-file work (chunk ID resolution, delete, insert) into one transaction instead of one transaction per file. Falls back per-file on individual resolution failures (warns and skips unresolved chunks).

Source

pub fn get_type_users( &self, type_name: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

Get chunks that reference a given type name.

Forward query: “who uses Config?” Returns chunks that have type edges pointing to the given type name.

Source

pub fn get_types_used_by( &self, chunk_name: &str, ) -> Result<Vec<TypeUsage>, StoreError>

Get types used by a given chunk (by function name).

Reverse query: “what types does parse_config use?” Returns TypeUsage structs where edge_kind is “” for catch-all types.

Source

pub fn get_type_users_batch( &self, type_names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch type users for multiple type names.

Returns type_name -> Vec. Uses WHERE IN with 200 names per batch.

Source

pub fn get_types_used_by_batch( &self, chunk_names: &[&str], ) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>

Batch-fetch types used by multiple chunk names.

Returns chunk_name -> Vec<(type_name, edge_kind)>. Uses WHERE IN with 200 names per batch.

Source

pub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>

Retrieves statistics about type edges in the store.

Queries the database to obtain the total count of type edges and the number of distinct target type names, then returns these metrics as a TypeEdgeStats struct.

§Returns

A Result containing TypeEdgeStats with the total number of edges and count of unique types, or a StoreError if the database query fails.

§Errors

Returns StoreError if the database query cannot be executed or the connection fails.

Source

pub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>

Load the type graph as forward + reverse adjacency lists.

Single SQL scan of type_edges joined with chunks, capped at 500K edges. Forward: chunk_name -> Vec<type_name>, Reverse: type_name -> Vec<chunk_name>.

Source

pub fn find_shared_type_users( &self, target_type: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Find types that share users with target (co-occurrence).

“Types commonly used alongside Config” → Vec<(type_name, overlap_count)>. Uses self-join: find other types referenced by the same chunks that reference target.

Source

pub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>

Delete type_edges for chunks no longer in the chunks table (GC).

Returns the number of pruned rows.

Source§

impl Store

Source

pub fn dim(&self) -> usize

Embedding dimension for vectors in this store.

Source

pub fn set_dim(&mut self, dim: usize)

Update the embedding dimension after init (fresh DB only).

Store::open defaults to EMBEDDING_DIM when the metadata table doesn’t exist yet. After init() writes the correct dim, call this to sync.

Source

pub fn open(path: &Path) -> Result<Self, StoreError>

Open an existing index with connection pooling

Source

pub fn open_light(path: &Path) -> Result<Self, StoreError>

Open an existing index with single-threaded runtime but full memory.

Uses current_thread tokio runtime (1 OS thread instead of 4) while keeping the full 256MB mmap and 16MB cache of open(). Ideal for read-only CLI commands on the primary project index where we need full search performance but don’t need multi-threaded async.

Source

pub fn open_readonly(path: &Path) -> Result<Self, StoreError>

Open an existing index in read-only mode with reduced resources.

Uses minimal connection pool, smaller cache, and single-threaded runtime. Suitable for reference stores and background builds that only read data.

Source

pub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>

Create a new index

Wraps all DDL and metadata inserts in a single transaction so a crash mid-init cannot leave a partial schema.

Source

pub fn close(self) -> Result<(), StoreError>

Gracefully close the store, performing WAL checkpoint.

This ensures all WAL changes are written to the main database file, reducing startup time for subsequent opens and freeing disk space used by WAL files.

Safe to skip (pool will close connections on drop), but recommended for clean shutdown in long-running processes.

Source§

impl Store

Source

pub fn search_embedding_only( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Raw embedding-only cosine similarity search (no RRF, no keyword matching).

You almost certainly want search_filtered() instead. This method skips hybrid RRF ranking, name boosting, and all filters. It exists for tests and internal building blocks only. Two production bugs came from calling this directly (PR #305).

Source

pub fn search_filtered( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Searches for embeddings matching a query with optional filtering and ranking.

§Arguments
  • query - The embedding vector to search for
  • filter - Search filter configuration including path patterns, RRF settings, and demotion rules
  • limit - Maximum number of results to return
  • threshold - Minimum similarity score threshold for results
§Returns

A vector of search results ranked by relevance, containing up to limit entries that exceed the similarity threshold.

§Errors

Returns StorageError if loading cached note summaries fails or if the underlying search operation encounters a storage error.

Source

pub fn search_filtered_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<SearchResult>, StoreError>

Search with optional vector index for O(log n) candidate retrieval

Source

pub fn search_by_candidate_ids( &self, candidate_ids: &[&str], query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Search within a set of candidate IDs (for HNSW-guided filtered search)

Source

pub fn search_unified_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<UnifiedResult>, StoreError>

Unified search with optional vector index.

Returns code-only results (SQ-9: notes removed from search pipeline). When an HNSW index is provided, uses O(log n) candidate retrieval.

Trait Implementations§

Source§

impl Drop for Store

Source§

fn drop(&mut self)

Performs a best-effort WAL (Write-Ahead Logging) checkpoint when the Store is dropped to prevent accumulation of large WAL files.

§Arguments
  • &mut self - A mutable reference to the Store instance being dropped
§Returns

Nothing. Errors during checkpoint are logged as warnings but not propagated, as Drop implementations cannot fail.

§Panics

Does not panic. Uses catch_unwind to safely handle potential panics from block_on when called from within an async context (e.g., dropping Store inside a tokio runtime).

Auto Trait Implementations§

§

impl !Freeze for Store

§

impl !RefUnwindSafe for Store

§

impl Send for Store

§

impl Sync for Store

§

impl Unpin for Store

§

impl UnsafeUnpin for Store

§

impl !UnwindSafe for Store

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more