pub struct Store { /* private fields */ }Expand description
Thread-safe SQLite store for chunks and embeddings
Uses sqlx connection pooling for concurrent reads and WAL mode for crash safety. All methods are synchronous but internally use an async runtime to execute sqlx operations.
§Memory-mapped I/O
open() sets PRAGMA mmap_size = 256MB per connection with a 4-connection pool,
reserving up to 1GB of virtual address space. open_readonly() uses 64MB × 1.
This is intentional and benign on 64-bit systems (128TB virtual address space).
Mmap pages are demand-paged from the database file and evicted under memory
pressure — actual RSS reflects only accessed pages, not the mmap reservation.
§Example
use cqs::Store;
use std::path::Path;
let store = Store::open(Path::new(".cqs/index.db"))?;
let stats = store.stats()?;
println!("Indexed {} chunks", stats.total_chunks);Implementations§
Source§impl Store
impl Store
Sourcepub fn upsert_calls(
&self,
chunk_id: &str,
calls: &[CallSite],
) -> Result<(), StoreError>
pub fn upsert_calls( &self, chunk_id: &str, calls: &[CallSite], ) -> Result<(), StoreError>
Insert or replace call sites for a chunk
Sourcepub fn upsert_calls_batch(
&self,
calls: &[(String, CallSite)],
) -> Result<(), StoreError>
pub fn upsert_calls_batch( &self, calls: &[(String, CallSite)], ) -> Result<(), StoreError>
Insert call sites for multiple chunks in a single transaction.
Takes (chunk_id, CallSite) pairs and batches them into one transaction.
Sourcepub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>
pub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>
Get all function names called by a given chunk.
Takes a chunk ID (unique) rather than a name. Returns only callee names (not full chunks) because:
- Callees may not exist in the index (external functions)
- Callers typically chain:
get_callees→get_callers_fullfor graph traversal
For richer callee data, see [get_callers_with_context].
Sourcepub fn call_stats(&self) -> Result<CallStats, StoreError>
pub fn call_stats(&self) -> Result<CallStats, StoreError>
Get call graph statistics
Sourcepub fn upsert_function_calls(
&self,
file: &Path,
function_calls: &[FunctionCalls],
) -> Result<(), StoreError>
pub fn upsert_function_calls( &self, file: &Path, function_calls: &[FunctionCalls], ) -> Result<(), StoreError>
Insert function calls for a file (full call graph, no size limits)
Sourcepub fn get_callers_full(
&self,
callee_name: &str,
) -> Result<Vec<CallerInfo>, StoreError>
pub fn get_callers_full( &self, callee_name: &str, ) -> Result<Vec<CallerInfo>, StoreError>
Find all callers of a function (from full call graph)
Sourcepub fn get_callees_full(
&self,
caller_name: &str,
file: Option<&str>,
) -> Result<Vec<(String, u32)>, StoreError>
pub fn get_callees_full( &self, caller_name: &str, file: Option<&str>, ) -> Result<Vec<(String, u32)>, StoreError>
Get all callees of a function (from full call graph)
When file is provided, scopes to callees of that function in that specific file.
When None, returns callees across all files (backwards compatible, but ambiguous
for common names like new, parse, from_str).
Sourcepub fn get_call_graph(&self) -> Result<CallGraph, StoreError>
pub fn get_call_graph(&self) -> Result<CallGraph, StoreError>
Load the call graph as forward + reverse adjacency lists.
Single SQL scan of function_calls, capped at 500K edges to prevent OOM
on adversarial databases. Typical projects have ~2000 edges.
Used by trace (forward BFS), impact (reverse BFS), and test-map (reverse BFS).
Cached call graph — populated on first access, returns clone from OnceLock.
No invalidation by design. The cache lives for the Store lifetime and is
never cleared. Normal usage is one Store per CLI command, so the index cannot
change while the cache is live. In long-lived modes (batch, watch), callers must
re-open the Store to pick up index changes — do not add a clear() here.
~15 call sites benefit from this single-scan caching.
Sourcepub fn get_callers_with_context(
&self,
callee_name: &str,
) -> Result<Vec<CallerWithContext>, StoreError>
pub fn get_callers_with_context( &self, callee_name: &str, ) -> Result<Vec<CallerWithContext>, StoreError>
Find callers with call-site line numbers for impact analysis.
Returns the caller function name, file, start line, and the specific line
where the call to callee_name occurs.
Sourcepub fn get_callers_with_context_batch(
&self,
callee_names: &[&str],
) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>
pub fn get_callers_with_context_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>
Batch-fetch callers with context for multiple callee names.
Returns callee_name -> Vec<CallerWithContext> using a single
WHERE callee_name IN (...) query per batch of 500 names.
Avoids N+1 get_callers_with_context calls in diff impact analysis.
Sourcepub fn get_callers_full_batch(
&self,
callee_names: &[&str],
) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>
pub fn get_callers_full_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>
Batch-fetch callers (full call graph) for multiple callee names.
Returns callee_name -> Vec<CallerInfo> using a single
WHERE callee_name IN (...) query per batch of 500 names.
Avoids N+1 get_callers_full calls in the context command.
Sourcepub fn get_callees_full_batch(
&self,
caller_names: &[&str],
) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>
pub fn get_callees_full_batch( &self, caller_names: &[&str], ) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>
Batch-fetch callees (full call graph) for multiple caller names.
Returns caller_name -> Vec<(callee_name, call_line)> using a single
WHERE caller_name IN (...) query per batch of 500 names.
Avoids N+1 get_callees_full calls in the context command.
Unlike [get_callees_full], does not support file scoping — returns
callees across all files. This is acceptable for the context command
which later filters by origin.
Sourcepub fn find_dead_code(
&self,
include_pub: bool,
) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>
pub fn find_dead_code( &self, include_pub: bool, ) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>
Find functions/methods never called by indexed code (dead code detection).
Returns two lists:
confident: Functions with no callers that are likely dead (with confidence scores)possibly_dead_pub: Public functions with no callers (may be used externally)
Uses two-phase query: lightweight metadata first, then content only for candidates that pass name/test/path filters (avoids loading large function bodies).
Exclusions applied:
- Entry point names (
main,init,handler, etc.) - Test functions (via
find_test_chunks()heuristics) - Functions in test files
- Trait implementations (dynamic dispatch invisible to call graph)
#[no_mangle]functions (FFI)
Confidence scoring:
- High: Private function in a file where no other function has callers
- Medium: Private function in an active file (other functions are called)
- Low: Method, or function with constructor-like name patterns
Sourcepub fn prune_stale_calls(&self) -> Result<u64, StoreError>
pub fn prune_stale_calls(&self) -> Result<u64, StoreError>
Delete function_calls for files no longer in the chunks table.
Used by GC to clean up orphaned call graph entries after pruning chunks.
Sourcepub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>
pub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>
Find test chunks using language-specific heuristics.
Identifies test functions across all supported languages by:
- Name patterns:
test_*(Rust/Python),Test*(Go) - Content patterns: sourced from
LanguageDef::test_markersper language - Path patterns: sourced from
LanguageDef::test_path_patternsper language
Uses a broad SQL filter then Rust post-filter for precision.
Cached test chunks — populated on first access, returns clone from OnceLock.
No invalidation by design. Same contract as get_call_graph: the cache is
intentionally write-once for the Store lifetime. Long-lived modes (batch, watch)
must re-open the Store to see updated test discovery — do not add a clear().
~14 call sites benefit from this single-scan caching.
Sourcepub fn get_caller_counts_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, u64>, StoreError>
pub fn get_caller_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>
Caller counts for multiple functions in one query.
Returns how many callers each function has. Functions not in the call graph won’t appear in the result map (caller count is implicitly 0).
Sourcepub fn get_callee_counts_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, u64>, StoreError>
pub fn get_callee_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>
Callee counts for multiple functions in one query.
Returns how many callees each function has. Functions not in the call graph won’t appear in the result map (callee count is implicitly 0).
Functions that share callers with target (called by the same functions).
For target X, finds functions Y where some function A calls both X and Y. Returns (function_name, overlap_count) sorted by overlap descending.
Functions that share callees with target (call the same functions).
For target X, finds functions Y where X and Y both call some function C. Returns (function_name, overlap_count) sorted by overlap descending.
Sourcepub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>
pub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>
Get full call graph statistics
Sourcepub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>
pub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>
Count distinct callers for each callee name.
Returns (callee_name, distinct_caller_count) pairs. Used by the
enrichment pass for IDF-style filtering: callees called by many
distinct callers are likely utilities (log, unwrap, etc.).
Source§impl Store
impl Store
Sourcepub fn get_metadata(&self, key: &str) -> Result<String, StoreError>
pub fn get_metadata(&self, key: &str) -> Result<String, StoreError>
Retrieve a single metadata value by key.
Returns Ok(value) if the key exists, or Err if not found or on DB error.
Used for lightweight metadata checks (e.g., model compatibility between stores).
Sourcepub fn upsert_chunks_batch(
&self,
chunks: &[(Chunk, Embedding)],
source_mtime: Option<i64>,
) -> Result<usize, StoreError>
pub fn upsert_chunks_batch( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, ) -> Result<usize, StoreError>
Insert or update chunks in batch using multi-row INSERT.
Chunks are inserted in batches of 52 rows (52 * 19 params = 988 < SQLite’s 999 limit). FTS operations remain per-row because FTS5 doesn’t support INSERT OR REPLACE.
Sourcepub fn upsert_chunk(
&self,
chunk: &Chunk,
embedding: &Embedding,
source_mtime: Option<i64>,
) -> Result<(), StoreError>
pub fn upsert_chunk( &self, chunk: &Chunk, embedding: &Embedding, source_mtime: Option<i64>, ) -> Result<(), StoreError>
Insert or update a single chunk
Sourcepub fn update_embeddings_batch(
&self,
updates: &[(String, Embedding)],
) -> Result<usize, StoreError>
pub fn update_embeddings_batch( &self, updates: &[(String, Embedding)], ) -> Result<usize, StoreError>
Update only the embedding for existing chunks by chunk ID.
updates is a slice of (chunk_id, embedding) pairs. Chunk IDs not
found in the store are logged and skipped (rows_affected == 0).
Returns the count of actually updated rows.
Used by the call-graph enrichment pass: chunk content hasn’t changed, only the NL description (and therefore embedding) is different. Skips FTS rebuild since content is unchanged.
Sourcepub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>
pub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>
Check if a file needs reindexing based on mtime.
Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime),
or Ok(None) if no reindex needed. This avoids reading file metadata twice.
Sourcepub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>
pub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>
Delete all chunks for an origin (file path or source identifier)
Sourcepub fn upsert_chunks_and_calls(
&self,
chunks: &[(Chunk, Embedding)],
source_mtime: Option<i64>,
calls: &[(String, CallSite)],
) -> Result<usize, StoreError>
pub fn upsert_chunks_and_calls( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, calls: &[(String, CallSite)], ) -> Result<usize, StoreError>
Atomically upsert chunks and their call graph in a single transaction.
Combines chunk upsert (with FTS) and call graph upsert into one transaction, preventing inconsistency from crashes between separate operations. Chunks are inserted in batches of 52 rows (52 * 19 = 988 < SQLite’s 999 limit).
Sourcepub fn prune_missing(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<u32, StoreError>
pub fn prune_missing( &self, existing_files: &HashSet<PathBuf>, ) -> Result<u32, StoreError>
Delete chunks for files that no longer exist
Batches deletes in groups of 100 to balance memory usage and query efficiency.
Uses Rust HashSet for existence check rather than SQL WHERE NOT IN because:
- Existing files often number 10k+, exceeding SQLite’s parameter limit (~999)
- Sending full file list to SQLite would require chunked queries anyway
- HashSet lookup is O(1), and we already have the set from enumerate_files()
Sourcepub fn count_stale_files(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<(u64, u64), StoreError>
pub fn count_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<(u64, u64), StoreError>
Count files that are stale (mtime changed) or missing from disk.
Compares stored source_mtime against current filesystem state. Only checks files with source_type=‘file’ (not notes or other sources).
Returns (stale_count, missing_count).
Sourcepub fn list_stale_files(
&self,
existing_files: &HashSet<PathBuf>,
) -> Result<StaleReport, StoreError>
pub fn list_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<StaleReport, StoreError>
List files that are stale (mtime changed) or missing from disk.
Like count_stale_files() but returns full details for display.
Requires existing_files from enumerate_files() (~100ms for 10k files).
Sourcepub fn check_origins_stale(
&self,
origins: &[&str],
root: &Path,
) -> Result<HashSet<String>, StoreError>
pub fn check_origins_stale( &self, origins: &[&str], root: &Path, ) -> Result<HashSet<String>, StoreError>
Check if specific origins are stale (mtime changed on disk).
Lightweight per-query check: only examines the given origins, not the entire index. O(result_count), not O(index_size).
root is the project root — origins are relative paths joined against it.
Returns the set of stale origin paths.
Sourcepub fn get_by_content_hash(&self, hash: &str) -> Option<Embedding>
pub fn get_by_content_hash(&self, hash: &str) -> Option<Embedding>
Get embedding by content hash (for reuse when content unchanged)
Note: Prefer get_embeddings_by_hashes for batch lookups in production.
Sourcepub fn get_embeddings_by_hashes(
&self,
hashes: &[&str],
) -> Result<HashMap<String, Embedding>, StoreError>
pub fn get_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>
Get embeddings for chunks with matching content hashes (batch lookup).
Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Sourcepub fn get_chunk_ids_and_embeddings_by_hashes(
&self,
hashes: &[&str],
) -> Result<Vec<(String, Embedding)>, StoreError>
pub fn get_chunk_ids_and_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<Vec<(String, Embedding)>, StoreError>
Get (chunk_id, embedding) pairs for chunks with matching content hashes.
Unlike get_embeddings_by_hashes (which keys by content_hash), this returns
the chunk ID alongside the embedding — exactly what HNSW insert_batch needs.
Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Sourcepub fn chunk_count(&self) -> Result<u64, StoreError>
pub fn chunk_count(&self) -> Result<u64, StoreError>
Get the number of chunks in the index
Sourcepub fn stats(&self) -> Result<IndexStats, StoreError>
pub fn stats(&self) -> Result<IndexStats, StoreError>
Get index statistics
Uses batched queries to minimize database round trips:
- Single query for counts with GROUP BY using CTEs
- Single query for all metadata keys
Sourcepub fn get_chunks_by_origin(
&self,
origin: &str,
) -> Result<Vec<ChunkSummary>, StoreError>
pub fn get_chunks_by_origin( &self, origin: &str, ) -> Result<Vec<ChunkSummary>, StoreError>
Get all chunks for a given file (origin).
Returns chunks sorted by line_start. Used by cqs context to list
all functions/types in a file.
Sourcepub fn get_chunks_by_origins_batch(
&self,
origins: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_chunks_by_origins_batch( &self, origins: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch chunks by multiple origin paths.
Returns a map of origin -> Veccqs where to avoid N+1 get_chunks_by_origin calls.
Sourcepub fn get_chunks_by_names_batch(
&self,
names: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_chunks_by_names_batch( &self, names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch chunks by multiple function names.
Returns a map of name -> Veccqs related to avoid N+1 get_chunks_by_name calls.
Sourcepub fn get_chunk_with_embedding(
&self,
id: &str,
) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>
pub fn get_chunk_with_embedding( &self, id: &str, ) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>
Batch signature search: find function/method chunks matching any of the given type names.
Get a chunk with its embedding vector.
Returns Ok(None) if the chunk doesn’t exist or has a corrupt embedding.
Used by cqs similar and cqs explain to search by example.
Sourcepub fn get_chunks_by_ids(
&self,
ids: &[&str],
) -> Result<HashMap<String, ChunkSummary>, StoreError>
pub fn get_chunks_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, ChunkSummary>, StoreError>
Batch-fetch chunks by IDs.
Returns a map of chunk ID → ChunkSummary for all found IDs.
Used by --expand to fetch parent chunks for small-to-big retrieval.
Sourcepub fn get_embeddings_by_ids(
&self,
ids: &[&str],
) -> Result<HashMap<String, Embedding>, StoreError>
pub fn get_embeddings_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>
Batch-fetch embeddings by chunk IDs.
Returns a map of chunk ID → Embedding for all found IDs. Skips chunks with corrupt embeddings. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).
Used by semantic_diff to avoid N+1 queries when comparing matched pairs.
Sourcepub fn search_by_names_batch(
&self,
names: &[&str],
limit_per_name: usize,
) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>
pub fn search_by_names_batch( &self, names: &[&str], limit_per_name: usize, ) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>
Batch name search: look up multiple names in a single call.
For each name, returns up to limit_per_name matching chunks.
Batches names into groups of 20 and issues a combined FTS OR query
per batch, then post-filters results to assign to matching names.
Used by gather BFS expansion to avoid N+1 query patterns.
Sourcepub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>
pub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>
Get identity metadata for all chunks (for diff comparison).
Returns minimal metadata needed to match chunks across stores. Loads all rows but only lightweight columns (no content or embeddings).
Sourcepub fn chunks_paged(
&self,
after_rowid: i64,
limit: usize,
) -> Result<(Vec<ChunkSummary>, i64), StoreError>
pub fn chunks_paged( &self, after_rowid: i64, limit: usize, ) -> Result<(Vec<ChunkSummary>, i64), StoreError>
Fetch a page of full chunks by rowid cursor.
Returns (chunks, next_cursor). When the returned vec is empty, iteration
is complete. Used by the enrichment pass to iterate all chunks without
loading everything into memory.
Sourcepub fn all_chunk_identities_filtered(
&self,
language: Option<&str>,
) -> Result<Vec<ChunkIdentity>, StoreError>
pub fn all_chunk_identities_filtered( &self, language: Option<&str>, ) -> Result<Vec<ChunkIdentity>, StoreError>
Like all_chunk_identities but with an optional language filter.
When language is Some, only chunks matching that language are returned,
avoiding loading all chunks into memory when only one language is needed.
Sourcepub fn embedding_batches(
&self,
batch_size: usize,
) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_
pub fn embedding_batches( &self, batch_size: usize, ) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_
Stream embeddings in batches for memory-efficient HNSW building.
Uses cursor-based pagination (WHERE rowid > last_seen) for stability under concurrent writes. LIMIT/OFFSET can skip or duplicate rows if the table is modified between batches.
§Arguments
batch_size- Number of embeddings per batch (recommend 10_000)
§Returns
Iterator yielding Result<Vec<(String, Embedding)>, StoreError>
§Panics
Must be called from sync context only. This iterator internally uses
block_on() which will panic if called from within an async runtime.
This is used for HNSW building which runs in dedicated sync threads.
Source§impl Store
impl Store
Sourcepub fn upsert_notes_batch(
&self,
notes: &[(Note, Embedding)],
source_file: &Path,
file_mtime: i64,
) -> Result<usize, StoreError>
pub fn upsert_notes_batch( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>
Insert or update notes in batch
Sourcepub fn search_notes(
&self,
query: &Embedding,
limit: usize,
threshold: f32,
) -> Result<Vec<NoteSearchResult>, StoreError>
pub fn search_notes( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<NoteSearchResult>, StoreError>
Search notes by embedding similarity
Note: This performs brute-force O(n) similarity search over all notes.
For large note collections, prefer using the unified HNSW index which
includes notes with note: prefix for efficient ANN search.
The query is limited to MAX_NOTES_SCAN (1000) to prevent OOM on very large collections. If you have more notes, use the unified search.
Sourcepub fn replace_notes_for_file(
&self,
notes: &[(Note, Embedding)],
source_file: &Path,
file_mtime: i64,
) -> Result<usize, StoreError>
pub fn replace_notes_for_file( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>
Replace all notes for a source file in a single transaction.
Atomically deletes existing notes and inserts new ones, preventing data loss if the process crashes mid-operation.
Sourcepub fn notes_need_reindex(
&self,
source_file: &Path,
) -> Result<Option<i64>, StoreError>
pub fn notes_need_reindex( &self, source_file: &Path, ) -> Result<Option<i64>, StoreError>
Check if notes file needs reindexing based on mtime.
Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime),
or Ok(None) if no reindex needed. This avoids reading file metadata twice.
Sourcepub fn note_count(&self) -> Result<u64, StoreError>
pub fn note_count(&self) -> Result<u64, StoreError>
Get note count
Sourcepub fn note_stats(&self) -> Result<NoteStats, StoreError>
pub fn note_stats(&self) -> Result<NoteStats, StoreError>
Get note statistics (total, warnings, patterns).
Uses SENTIMENT_NEGATIVE_THRESHOLD (-0.3) and SENTIMENT_POSITIVE_THRESHOLD (0.3)
to classify notes. These thresholds work with discrete sentiment values
(-1, -0.5, 0, 0.5, 1) – negative values (-1, -0.5) count as warnings,
positive values (0.5, 1) count as patterns.
Sourcepub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
pub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
List all notes with metadata (no embeddings).
Returns NoteSummary for each note, useful for mention-based filtering
without the cost of loading embeddings.
Sourcepub fn note_embeddings(&self) -> Result<Vec<(String, Embedding)>, StoreError>
pub fn note_embeddings(&self) -> Result<Vec<(String, Embedding)>, StoreError>
Get all note embeddings for HNSW index building.
Returns (id, embedding) pairs with note: prefix on IDs to distinguish from chunks.
Source§impl Store
impl Store
Sourcepub fn upsert_type_edges(
&self,
chunk_id: &str,
type_refs: &[TypeRef],
) -> Result<(), StoreError>
pub fn upsert_type_edges( &self, chunk_id: &str, type_refs: &[TypeRef], ) -> Result<(), StoreError>
Upsert type edges for a single chunk.
Deletes existing type edges for the chunk, then batch-inserts new ones. 4 binds per row → 249 rows per batch (996 < 999 SQLite limit).
Sourcepub fn upsert_type_edges_for_file(
&self,
file: &Path,
chunk_type_refs: &[ChunkTypeRefs],
) -> Result<(), StoreError>
pub fn upsert_type_edges_for_file( &self, file: &Path, chunk_type_refs: &[ChunkTypeRefs], ) -> Result<(), StoreError>
Upsert type edges for all chunks in a file.
Resolves chunk names to chunk IDs via the chunks table, then deletes old type edges and batch-inserts new ones. Chunks not found in the database are warned and skipped (not an error).
For windowed chunks, associates type edges with the first window (window_idx IS NULL or window_idx = 0).
Sourcepub fn get_type_users(
&self,
type_name: &str,
) -> Result<Vec<ChunkSummary>, StoreError>
pub fn get_type_users( &self, type_name: &str, ) -> Result<Vec<ChunkSummary>, StoreError>
Get chunks that reference a given type name.
Forward query: “who uses Config?” Returns chunks that have type edges pointing to the given type name.
Sourcepub fn get_types_used_by(
&self,
chunk_name: &str,
) -> Result<Vec<TypeUsage>, StoreError>
pub fn get_types_used_by( &self, chunk_name: &str, ) -> Result<Vec<TypeUsage>, StoreError>
Get types used by a given chunk (by function name).
Reverse query: “what types does parse_config use?” Returns TypeUsage structs
where edge_kind is “” for catch-all types.
Sourcepub fn get_type_users_batch(
&self,
type_names: &[&str],
) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
pub fn get_type_users_batch( &self, type_names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>
Batch-fetch type users for multiple type names.
Returns type_name -> Vec
Sourcepub fn get_types_used_by_batch(
&self,
chunk_names: &[&str],
) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>
pub fn get_types_used_by_batch( &self, chunk_names: &[&str], ) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>
Batch-fetch types used by multiple chunk names.
Returns chunk_name -> Vec<(type_name, edge_kind)>. Uses WHERE IN with 200 names per batch.
Sourcepub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>
pub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>
Get type edge statistics.
Sourcepub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>
pub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>
Load the type graph as forward + reverse adjacency lists.
Single SQL scan of type_edges joined with chunks, capped at 500K edges.
Forward: chunk_name -> Vec<type_name>, Reverse: type_name -> Vec<chunk_name>.
Find types that share users with target (co-occurrence).
“Types commonly used alongside Config” → Vec<(type_name, overlap_count)>. Uses self-join: find other types referenced by the same chunks that reference target.
Sourcepub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>
pub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>
Delete type_edges for chunks no longer in the chunks table (GC).
Returns the number of pruned rows.
Source§impl Store
impl Store
Sourcepub fn open(path: &Path) -> Result<Self, StoreError>
pub fn open(path: &Path) -> Result<Self, StoreError>
Open an existing index with connection pooling
Sourcepub fn open_readonly(path: &Path) -> Result<Self, StoreError>
pub fn open_readonly(path: &Path) -> Result<Self, StoreError>
Open an existing index in read-only mode with reduced resources.
Uses minimal connection pool, smaller cache, and single-threaded runtime. Suitable for reference stores and background builds that only read data.
Sourcepub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>
pub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>
Create a new index
Wraps all DDL and metadata inserts in a single transaction so a crash mid-init cannot leave a partial schema.
Sourcepub fn search_fts(
&self,
query: &str,
limit: usize,
) -> Result<Vec<String>, StoreError>
pub fn search_fts( &self, query: &str, limit: usize, ) -> Result<Vec<String>, StoreError>
Search FTS5 index for keyword matches.
§Search Method Overview
The Store provides several search methods with different characteristics:
-
search_fts: Full-text keyword search using SQLite FTS5. Returns chunk IDs. Best for: Exact keyword matches, symbol lookup by name fragment. -
search_by_name: Definition search by function/struct name. Uses FTS5 with heavy weighting on the name column. Returns fullSearchResultwith scores. Best for: “Where is X defined?” queries. -
search_filtered(in search.rs): Semantic search with optional language/path filters. Can use RRF hybrid search combining semantic + FTS scores. Best for: Natural language queries like “retry with exponential backoff”. -
search_filtered_with_index(in search.rs): Likesearch_filteredbut uses HNSW/CAGRA vector index for O(log n) candidate retrieval instead of brute force. Best for: Large indexes (>5k chunks) where brute force is slow.
Sourcepub fn search_by_name(
&self,
name: &str,
limit: usize,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_by_name( &self, name: &str, limit: usize, ) -> Result<Vec<SearchResult>, StoreError>
Search for chunks by name (definition search).
Searches the FTS5 name column for exact or prefix matches. Use this for “where is X defined?” queries instead of semantic search.
Sourcepub fn touch_updated_at(&self) -> Result<(), StoreError>
pub fn touch_updated_at(&self) -> Result<(), StoreError>
Update the updated_at metadata timestamp to now.
Call after indexing operations complete (pipeline, watch reindex, note sync) to track when the index was last modified.
Sourcepub fn cached_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
pub fn cached_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>
Get cached notes summaries (loaded on first call, invalidated on mutation).
Returns a cloned Vec rather than a slice reference to avoid holding the RwLock read guard across caller code. The clone cost is negligible — notes are typically <100 entries with small strings.
Sourcepub fn close(self) -> Result<(), StoreError>
pub fn close(self) -> Result<(), StoreError>
Gracefully close the store, performing WAL checkpoint.
This ensures all WAL changes are written to the main database file, reducing startup time for subsequent opens and freeing disk space used by WAL files.
Safe to skip (pool will close connections on drop), but recommended for clean shutdown in long-running processes.
Source§impl Store
impl Store
Sourcepub fn search_embedding_only(
&self,
query: &Embedding,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_embedding_only( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Raw embedding-only cosine similarity search (no RRF, no keyword matching).
You almost certainly want search_filtered() instead. This method skips
hybrid RRF ranking, name boosting, and all filters. It exists for tests and
internal building blocks only. Two production bugs came from calling this
directly (PR #305).
Sourcepub fn search_filtered(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_filtered( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Search with filters
Sourcepub fn search_filtered_with_index(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
index: Option<&dyn VectorIndex>,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_filtered_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<SearchResult>, StoreError>
Search with optional vector index for O(log n) candidate retrieval
Sourcepub fn search_by_candidate_ids(
&self,
candidate_ids: &[&str],
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
) -> Result<Vec<SearchResult>, StoreError>
pub fn search_by_candidate_ids( &self, candidate_ids: &[&str], query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>
Search within a set of candidate IDs (for HNSW-guided filtered search)
Sourcepub fn search_unified_with_index(
&self,
query: &Embedding,
filter: &SearchFilter,
limit: usize,
threshold: f32,
index: Option<&dyn VectorIndex>,
) -> Result<Vec<UnifiedResult>, StoreError>
pub fn search_unified_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<UnifiedResult>, StoreError>
Unified search with optional vector index
When an HNSW index is provided, uses O(log n) search for both chunks and notes.
Note IDs in HNSW are prefixed with note: to distinguish from chunk IDs.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for Store
impl !RefUnwindSafe for Store
impl Send for Store
impl Sync for Store
impl Unpin for Store
impl UnsafeUnpin for Store
impl !UnwindSafe for Store
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more