Struct Store

Source

pub struct Store { /* private fields */ }

Expand description

Thread-safe SQLite store for chunks and embeddings

Uses sqlx connection pooling for concurrent reads and WAL mode for crash safety. All methods are synchronous but internally use an async runtime to execute sqlx operations.

§Memory-mapped I/O

open() sets PRAGMA mmap_size = 256MB per connection with a 4-connection pool, reserving up to 1GB of virtual address space. open_readonly() uses 64MB × 1. This is intentional and benign on 64-bit systems (128TB virtual address space). Mmap pages are demand-paged from the database file and evicted under memory pressure — actual RSS reflects only accessed pages, not the mmap reservation.

§Example

use cqs::Store;
use std::path::Path;

let store = Store::open(Path::new(".cqs/index.db"))?;
let stats = store.stats()?;
println!("Indexed {} chunks", stats.total_chunks);

Implementations§

Source §

impl Store

Source

pub fn upsert_calls( &self, chunk_id: &str, calls: &[CallSite], ) -> Result<(), StoreError>

Insert or replace call sites for a chunk

Source

pub fn upsert_calls_batch( &self, calls: &[(String, CallSite)], ) -> Result<(), StoreError>

Insert call sites for multiple chunks in a single transaction.

Takes (chunk_id, CallSite) pairs and batches them into one transaction.

Source

pub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>

Get all function names called by a given chunk.

Takes a chunk ID (unique) rather than a name. Returns only callee names (not full chunks) because:

Callees may not exist in the index (external functions)
Callers typically chain: get_callees → get_callers_full for graph traversal

For richer callee data, see [get_callers_with_context].

Source

pub fn call_stats(&self) -> Result<CallStats, StoreError>

Get call graph statistics

Source

pub fn upsert_function_calls( &self, file: &Path, function_calls: &[FunctionCalls], ) -> Result<(), StoreError>

Insert function calls for a file (full call graph, no size limits)

Source

pub fn get_callers_full( &self, callee_name: &str, ) -> Result<Vec<CallerInfo>, StoreError>

Find all callers of a function (from full call graph)

Source

pub fn get_callees_full( &self, caller_name: &str, file: Option<&str>, ) -> Result<Vec<(String, u32)>, StoreError>

Get all callees of a function (from full call graph)

When file is provided, scopes to callees of that function in that specific file. When None, returns callees across all files (backwards compatible, but ambiguous for common names like new, parse, from_str).

Source

pub fn get_call_graph(&self) -> Result<CallGraph, StoreError>

Load the call graph as forward + reverse adjacency lists.

Single SQL scan of function_calls, capped at 500K edges to prevent OOM on adversarial databases. Typical projects have ~2000 edges. Used by trace (forward BFS), impact (reverse BFS), and test-map (reverse BFS).

Cached call graph — populated on first access, returns clone from OnceLock.

No invalidation by design. The cache lives for the Store lifetime and is never cleared. Normal usage is one Store per CLI command, so the index cannot change while the cache is live. In long-lived modes (batch, watch), callers must re-open the Store to pick up index changes — do not add a clear() here. ~15 call sites benefit from this single-scan caching.

Source

pub fn get_callers_with_context( &self, callee_name: &str, ) -> Result<Vec<CallerWithContext>, StoreError>

Find callers with call-site line numbers for impact analysis.

Returns the caller function name, file, start line, and the specific line where the call to callee_name occurs.

Source

pub fn get_callers_with_context_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>

Batch-fetch callers with context for multiple callee names.

Returns callee_name -> Vec<CallerWithContext> using a single WHERE callee_name IN (...) query per batch of 500 names. Avoids N+1 get_callers_with_context calls in diff impact analysis.

Source

pub fn get_callers_full_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>

Batch-fetch callers (full call graph) for multiple callee names.

Returns callee_name -> Vec<CallerInfo> using a single WHERE callee_name IN (...) query per batch of 500 names. Avoids N+1 get_callers_full calls in the context command.

Source

pub fn get_callees_full_batch( &self, caller_names: &[&str], ) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>

Batch-fetch callees (full call graph) for multiple caller names.

Returns caller_name -> Vec<(callee_name, call_line)> using a single WHERE caller_name IN (...) query per batch of 500 names. Avoids N+1 get_callees_full calls in the context command.

Unlike [get_callees_full], does not support file scoping — returns callees across all files. This is acceptable for the context command which later filters by origin.

Source

pub fn find_dead_code( &self, include_pub: bool, ) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>

Find functions/methods never called by indexed code (dead code detection).

Returns two lists:

confident: Functions with no callers that are likely dead (with confidence scores)
possibly_dead_pub: Public functions with no callers (may be used externally)

Uses two-phase query: lightweight metadata first, then content only for candidates that pass name/test/path filters (avoids loading large function bodies).

Exclusions applied:

Entry point names (main, init, handler, etc.)
Test functions (via find_test_chunks() heuristics)
Functions in test files
Trait implementations (dynamic dispatch invisible to call graph)
#[no_mangle] functions (FFI)

Confidence scoring:

High: Private function in a file where no other function has callers
Medium: Private function in an active file (other functions are called)
Low: Method, or function with constructor-like name patterns

Source

pub fn prune_stale_calls(&self) -> Result<u64, StoreError>

Delete function_calls for files no longer in the chunks table.

Used by GC to clean up orphaned call graph entries after pruning chunks.

Source

pub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>

Find test chunks using language-specific heuristics.

Identifies test functions across all supported languages by:

Name patterns: test_* (Rust/Python), Test* (Go)
Content patterns: sourced from LanguageDef::test_markers per language
Path patterns: sourced from LanguageDef::test_path_patterns per language

Uses a broad SQL filter then Rust post-filter for precision.

Cached test chunks — populated on first access, returns clone from OnceLock.

No invalidation by design. Same contract as get_call_graph: the cache is intentionally write-once for the Store lifetime. Long-lived modes (batch, watch) must re-open the Store to see updated test discovery — do not add a clear(). ~14 call sites benefit from this single-scan caching.

Source

pub fn get_caller_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

Caller counts for multiple functions in one query.

Returns how many callers each function has. Functions not in the call graph won’t appear in the result map (caller count is implicitly 0).

Source

pub fn get_callee_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

Callee counts for multiple functions in one query.

Returns how many callees each function has. Functions not in the call graph won’t appear in the result map (callee count is implicitly 0).

Source

pub fn find_shared_callers( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Functions that share callers with target (called by the same functions).

For target X, finds functions Y where some function A calls both X and Y. Returns (function_name, overlap_count) sorted by overlap descending.

Source

pub fn find_shared_callees( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Functions that share callees with target (call the same functions).

For target X, finds functions Y where X and Y both call some function C. Returns (function_name, overlap_count) sorted by overlap descending.

Source

pub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>

Get full call graph statistics

Source

pub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>

Count distinct callers for each callee name.

Returns (callee_name, distinct_caller_count) pairs. Used by the enrichment pass for IDF-style filtering: callees called by many distinct callers are likely utilities (log, unwrap, etc.).

Source §

impl Store

Source

pub fn get_metadata(&self, key: &str) -> Result<String, StoreError>

Retrieve a single metadata value by key.

Returns Ok(value) if the key exists, or Err if not found or on DB error. Used for lightweight metadata checks (e.g., model compatibility between stores).

Source

pub fn upsert_chunks_batch( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, ) -> Result<usize, StoreError>

Insert or update chunks in batch using multi-row INSERT.

Chunks are inserted in batches of 52 rows (52 * 19 params = 988 < SQLite’s 999 limit). FTS operations remain per-row because FTS5 doesn’t support INSERT OR REPLACE.

Source

pub fn upsert_chunk( &self, chunk: &Chunk, embedding: &Embedding, source_mtime: Option<i64>, ) -> Result<(), StoreError>

Insert or update a single chunk

Source

pub fn update_embeddings_batch( &self, updates: &[(String, Embedding)], ) -> Result<usize, StoreError>

Update only the embedding for existing chunks by chunk ID.

updates is a slice of (chunk_id, embedding) pairs. Chunk IDs not found in the store are logged and skipped (rows_affected == 0). Returns the count of actually updated rows.

Used by the call-graph enrichment pass: chunk content hasn’t changed, only the NL description (and therefore embedding) is different. Skips FTS rebuild since content is unchanged.

Source

pub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>

Check if a file needs reindexing based on mtime.

Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime), or Ok(None) if no reindex needed. This avoids reading file metadata twice.

Source

pub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>

Delete all chunks for an origin (file path or source identifier)

Source

pub fn upsert_chunks_and_calls( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, calls: &[(String, CallSite)], ) -> Result<usize, StoreError>

Atomically upsert chunks and their call graph in a single transaction.

Combines chunk upsert (with FTS) and call graph upsert into one transaction, preventing inconsistency from crashes between separate operations. Chunks are inserted in batches of 52 rows (52 * 19 = 988 < SQLite’s 999 limit).

Source

pub fn prune_missing( &self, existing_files: &HashSet<PathBuf>, ) -> Result<u32, StoreError>

Delete chunks for files that no longer exist

Batches deletes in groups of 100 to balance memory usage and query efficiency.

Uses Rust HashSet for existence check rather than SQL WHERE NOT IN because:

Existing files often number 10k+, exceeding SQLite’s parameter limit (~999)
Sending full file list to SQLite would require chunked queries anyway
HashSet lookup is O(1), and we already have the set from enumerate_files()

Source

pub fn count_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<(u64, u64), StoreError>

Count files that are stale (mtime changed) or missing from disk.

Compares stored source_mtime against current filesystem state. Only checks files with source_type=‘file’ (not notes or other sources).

Returns (stale_count, missing_count).

Source

pub fn list_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<StaleReport, StoreError>

List files that are stale (mtime changed) or missing from disk.

Like count_stale_files() but returns full details for display. Requires existing_files from enumerate_files() (~100ms for 10k files).

Source

pub fn check_origins_stale( &self, origins: &[&str], root: &Path, ) -> Result<HashSet<String>, StoreError>

Check if specific origins are stale (mtime changed on disk).

Lightweight per-query check: only examines the given origins, not the entire index. O(result_count), not O(index_size).

root is the project root — origins are relative paths joined against it.

Returns the set of stale origin paths.

Source

pub fn get_by_content_hash(&self, hash: &str) -> Option<Embedding>

Get embedding by content hash (for reuse when content unchanged)

Note: Prefer get_embeddings_by_hashes for batch lookups in production.

Source

pub fn get_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

Get embeddings for chunks with matching content hashes (batch lookup).

Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Source

pub fn get_chunk_ids_and_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<Vec<(String, Embedding)>, StoreError>

Get (chunk_id, embedding) pairs for chunks with matching content hashes.

Unlike get_embeddings_by_hashes (which keys by content_hash), this returns the chunk ID alongside the embedding — exactly what HNSW insert_batch needs.

Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Source

pub fn chunk_count(&self) -> Result<u64, StoreError>

Get the number of chunks in the index

Source

pub fn stats(&self) -> Result<IndexStats, StoreError>

Get index statistics

Uses batched queries to minimize database round trips:

Single query for counts with GROUP BY using CTEs
Single query for all metadata keys

Source

pub fn get_chunks_by_origin( &self, origin: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

Get all chunks for a given file (origin).

Returns chunks sorted by line_start. Used by cqs context to list all functions/types in a file.

Source

pub fn get_chunks_by_origins_batch( &self, origins: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch chunks by multiple origin paths.

Returns a map of origin -> Vec for all found origins. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999). Used by cqs where to avoid N+1 get_chunks_by_origin calls.

Source

pub fn get_chunks_by_names_batch( &self, names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch chunks by multiple function names.

Returns a map of name -> Vec for all found names. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999). Used by cqs related to avoid N+1 get_chunks_by_name calls.

Source

pub fn get_chunk_with_embedding( &self, id: &str, ) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>

Batch signature search: find function/method chunks matching any of the given type names.

Get a chunk with its embedding vector.

Returns Ok(None) if the chunk doesn’t exist or has a corrupt embedding. Used by cqs similar and cqs explain to search by example.

Source

pub fn get_chunks_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, ChunkSummary>, StoreError>

Batch-fetch chunks by IDs.

Returns a map of chunk ID → ChunkSummary for all found IDs. Used by --expand to fetch parent chunks for small-to-big retrieval.

Source

pub fn get_embeddings_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

Batch-fetch embeddings by chunk IDs.

Returns a map of chunk ID → Embedding for all found IDs. Skips chunks with corrupt embeddings. Batches queries in groups of 500 to stay within SQLite’s parameter limit (~999).

Used by semantic_diff to avoid N+1 queries when comparing matched pairs.

Source

pub fn search_by_names_batch( &self, names: &[&str], limit_per_name: usize, ) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>

Batch name search: look up multiple names in a single call.

For each name, returns up to limit_per_name matching chunks. Batches names into groups of 20 and issues a combined FTS OR query per batch, then post-filters results to assign to matching names.

Used by gather BFS expansion to avoid N+1 query patterns.

Source

pub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>

Get identity metadata for all chunks (for diff comparison).

Returns minimal metadata needed to match chunks across stores. Loads all rows but only lightweight columns (no content or embeddings).

Source

pub fn chunks_paged( &self, after_rowid: i64, limit: usize, ) -> Result<(Vec<ChunkSummary>, i64), StoreError>

Fetch a page of full chunks by rowid cursor.

Returns (chunks, next_cursor). When the returned vec is empty, iteration is complete. Used by the enrichment pass to iterate all chunks without loading everything into memory.

Source

pub fn all_chunk_identities_filtered( &self, language: Option<&str>, ) -> Result<Vec<ChunkIdentity>, StoreError>

Like all_chunk_identities but with an optional language filter.

When language is Some, only chunks matching that language are returned, avoiding loading all chunks into memory when only one language is needed.

Source

pub fn embedding_batches( &self, batch_size: usize, ) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_

Stream embeddings in batches for memory-efficient HNSW building.

Uses cursor-based pagination (WHERE rowid > last_seen) for stability under concurrent writes. LIMIT/OFFSET can skip or duplicate rows if the table is modified between batches.

§Arguments

batch_size - Number of embeddings per batch (recommend 10_000)

§Returns

Iterator yielding Result<Vec<(String, Embedding)>, StoreError>

§Panics

Must be called from sync context only. This iterator internally uses block_on() which will panic if called from within an async runtime. This is used for HNSW building which runs in dedicated sync threads.

Source §

impl Store

Source

pub fn upsert_notes_batch( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

Insert or update notes in batch

Source

pub fn search_notes( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<NoteSearchResult>, StoreError>

Search notes by embedding similarity

Note: This performs brute-force O(n) similarity search over all notes. For large note collections, prefer using the unified HNSW index which includes notes with note: prefix for efficient ANN search.

The query is limited to MAX_NOTES_SCAN (1000) to prevent OOM on very large collections. If you have more notes, use the unified search.

Source

pub fn replace_notes_for_file( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

Replace all notes for a source file in a single transaction.

Atomically deletes existing notes and inserts new ones, preventing data loss if the process crashes mid-operation.

Source

pub fn notes_need_reindex( &self, source_file: &Path, ) -> Result<Option<i64>, StoreError>

Check if notes file needs reindexing based on mtime.

Returns Ok(Some(mtime)) if reindex needed (with the file’s current mtime), or Ok(None) if no reindex needed. This avoids reading file metadata twice.

Source

pub fn note_count(&self) -> Result<u64, StoreError>

Get note count

Source

pub fn note_stats(&self) -> Result<NoteStats, StoreError>

Get note statistics (total, warnings, patterns).

Uses SENTIMENT_NEGATIVE_THRESHOLD (-0.3) and SENTIMENT_POSITIVE_THRESHOLD (0.3) to classify notes. These thresholds work with discrete sentiment values (-1, -0.5, 0, 0.5, 1) – negative values (-1, -0.5) count as warnings, positive values (0.5, 1) count as patterns.

Source

pub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>

List all notes with metadata (no embeddings).

Returns NoteSummary for each note, useful for mention-based filtering without the cost of loading embeddings.

Source

pub fn note_embeddings(&self) -> Result<Vec<(String, Embedding)>, StoreError>

Get all note embeddings for HNSW index building.

Returns (id, embedding) pairs with note: prefix on IDs to distinguish from chunks.

Source §

impl Store

Source

pub fn upsert_type_edges( &self, chunk_id: &str, type_refs: &[TypeRef], ) -> Result<(), StoreError>

Upsert type edges for a single chunk.

Deletes existing type edges for the chunk, then batch-inserts new ones. 4 binds per row → 249 rows per batch (996 < 999 SQLite limit).

Source

pub fn upsert_type_edges_for_file( &self, file: &Path, chunk_type_refs: &[ChunkTypeRefs], ) -> Result<(), StoreError>

Upsert type edges for all chunks in a file.

Resolves chunk names to chunk IDs via the chunks table, then deletes old type edges and batch-inserts new ones. Chunks not found in the database are warned and skipped (not an error).

For windowed chunks, associates type edges with the first window (window_idx IS NULL or window_idx = 0).

Source

pub fn get_type_users( &self, type_name: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

Get chunks that reference a given type name.

Forward query: “who uses Config?” Returns chunks that have type edges pointing to the given type name.

Source

pub fn get_types_used_by( &self, chunk_name: &str, ) -> Result<Vec<TypeUsage>, StoreError>

Get types used by a given chunk (by function name).

Reverse query: “what types does parse_config use?” Returns TypeUsage structs where edge_kind is “” for catch-all types.

Source

pub fn get_type_users_batch( &self, type_names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

Batch-fetch type users for multiple type names.

Returns type_name -> Vec. Uses WHERE IN with 200 names per batch.

Source

pub fn get_types_used_by_batch( &self, chunk_names: &[&str], ) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>

Batch-fetch types used by multiple chunk names.

Returns chunk_name -> Vec<(type_name, edge_kind)>. Uses WHERE IN with 200 names per batch.

Source

pub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>

Get type edge statistics.

Source

pub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>

Load the type graph as forward + reverse adjacency lists.

Single SQL scan of type_edges joined with chunks, capped at 500K edges. Forward: chunk_name -> Vec<type_name>, Reverse: type_name -> Vec<chunk_name>.

Source

pub fn find_shared_type_users( &self, target_type: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

Find types that share users with target (co-occurrence).

“Types commonly used alongside Config” → Vec<(type_name, overlap_count)>. Uses self-join: find other types referenced by the same chunks that reference target.

Source

pub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>

Delete type_edges for chunks no longer in the chunks table (GC).

Returns the number of pruned rows.

Source §

impl Store

Source

pub fn open(path: &Path) -> Result<Self, StoreError>

Open an existing index with connection pooling

Source

pub fn open_readonly(path: &Path) -> Result<Self, StoreError>

Open an existing index in read-only mode with reduced resources.

Uses minimal connection pool, smaller cache, and single-threaded runtime. Suitable for reference stores and background builds that only read data.

Source

pub fn init(&self, model_info: &ModelInfo) -> Result<(), StoreError>

Create a new index

Wraps all DDL and metadata inserts in a single transaction so a crash mid-init cannot leave a partial schema.

Source

pub fn search_fts( &self, query: &str, limit: usize, ) -> Result<Vec<String>, StoreError>

Search FTS5 index for keyword matches.

§Search Method Overview

The Store provides several search methods with different characteristics:

search_fts: Full-text keyword search using SQLite FTS5. Returns chunk IDs. Best for: Exact keyword matches, symbol lookup by name fragment.
search_by_name: Definition search by function/struct name. Uses FTS5 with heavy weighting on the name column. Returns full SearchResult with scores. Best for: “Where is X defined?” queries.
search_filtered (in search.rs): Semantic search with optional language/path filters. Can use RRF hybrid search combining semantic + FTS scores. Best for: Natural language queries like “retry with exponential backoff”.
search_filtered_with_index (in search.rs): Like search_filtered but uses HNSW/CAGRA vector index for O(log n) candidate retrieval instead of brute force. Best for: Large indexes (>5k chunks) where brute force is slow.

Source

pub fn search_by_name( &self, name: &str, limit: usize, ) -> Result<Vec<SearchResult>, StoreError>

Search for chunks by name (definition search).

Searches the FTS5 name column for exact or prefix matches. Use this for “where is X defined?” queries instead of semantic search.

Source

pub fn touch_updated_at(&self) -> Result<(), StoreError>

Update the updated_at metadata timestamp to now.

Call after indexing operations complete (pipeline, watch reindex, note sync) to track when the index was last modified.

Source

pub fn cached_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>

Get cached notes summaries (loaded on first call, invalidated on mutation).

Returns a cloned Vec rather than a slice reference to avoid holding the RwLock read guard across caller code. The clone cost is negligible — notes are typically <100 entries with small strings.

Source

pub fn close(self) -> Result<(), StoreError>

Gracefully close the store, performing WAL checkpoint.

This ensures all WAL changes are written to the main database file, reducing startup time for subsequent opens and freeing disk space used by WAL files.

Safe to skip (pool will close connections on drop), but recommended for clean shutdown in long-running processes.

Source §

impl Store

Source

pub fn search_embedding_only( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Raw embedding-only cosine similarity search (no RRF, no keyword matching).

You almost certainly want search_filtered() instead. This method skips hybrid RRF ranking, name boosting, and all filters. It exists for tests and internal building blocks only. Two production bugs came from calling this directly (PR #305).

Source

pub fn search_filtered( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Search with filters

Source

pub fn search_filtered_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<SearchResult>, StoreError>

Search with optional vector index for O(log n) candidate retrieval

Source

pub fn search_by_candidate_ids( &self, candidate_ids: &[&str], query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, ) -> Result<Vec<SearchResult>, StoreError>

Search within a set of candidate IDs (for HNSW-guided filtered search)

Source

pub fn search_unified_with_index( &self, query: &Embedding, filter: &SearchFilter, limit: usize, threshold: f32, index: Option<&dyn VectorIndex>, ) -> Result<Vec<UnifiedResult>, StoreError>

Unified search with optional vector index

When an HNSW index is provided, uses O(log n) search for both chunks and notes. Note IDs in HNSW are prefixed with note: to distinguish from chunk IDs.

Trait Implementations§

Source §

impl Drop for Store

Source §

fn drop(&mut self)

Executes the destructor for this type. Read more

Auto Trait Implementations§

§

impl !UnwindSafe for Store

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T> PolicyExt for T
where T: ?Sized,

Source §

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more

Source §

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more

Source §

impl<T> Same for T

Source §

type Output = T

Should always be Self

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Source §

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source §

fn vzip(self) -> V

Source §

impl<T> WithSubscriber for T

Source §

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

Source §

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more

Struct Store Copy item path

§Memory-mapped I/O

§Example

Implementations§

impl Store

pub fn upsert_calls( &self, chunk_id: &str, calls: &[CallSite], ) -> Result<(), StoreError>

pub fn upsert_calls_batch( &self, calls: &[(String, CallSite)], ) -> Result<(), StoreError>

pub fn get_callees(&self, chunk_id: &str) -> Result<Vec<String>, StoreError>

pub fn call_stats(&self) -> Result<CallStats, StoreError>

pub fn upsert_function_calls( &self, file: &Path, function_calls: &[FunctionCalls], ) -> Result<(), StoreError>

pub fn get_callers_full( &self, callee_name: &str, ) -> Result<Vec<CallerInfo>, StoreError>

pub fn get_callees_full( &self, caller_name: &str, file: Option<&str>, ) -> Result<Vec<(String, u32)>, StoreError>

pub fn get_call_graph(&self) -> Result<CallGraph, StoreError>

pub fn get_callers_with_context( &self, callee_name: &str, ) -> Result<Vec<CallerWithContext>, StoreError>

pub fn get_callers_with_context_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerWithContext>>, StoreError>

pub fn get_callers_full_batch( &self, callee_names: &[&str], ) -> Result<HashMap<String, Vec<CallerInfo>>, StoreError>

pub fn get_callees_full_batch( &self, caller_names: &[&str], ) -> Result<HashMap<String, Vec<(String, u32)>>, StoreError>

pub fn find_dead_code( &self, include_pub: bool, ) -> Result<(Vec<DeadFunction>, Vec<DeadFunction>), StoreError>

pub fn prune_stale_calls(&self) -> Result<u64, StoreError>

pub fn find_test_chunks(&self) -> Result<Vec<ChunkSummary>, StoreError>

pub fn get_caller_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

pub fn get_callee_counts_batch( &self, names: &[&str], ) -> Result<HashMap<String, u64>, StoreError>

pub fn find_shared_callers( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

pub fn find_shared_callees( &self, target: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

pub fn function_call_stats(&self) -> Result<FunctionCallStats, StoreError>

pub fn callee_caller_counts(&self) -> Result<Vec<(String, usize)>, StoreError>

impl Store

pub fn get_metadata(&self, key: &str) -> Result<String, StoreError>

pub fn upsert_chunks_batch( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, ) -> Result<usize, StoreError>

pub fn upsert_chunk( &self, chunk: &Chunk, embedding: &Embedding, source_mtime: Option<i64>, ) -> Result<(), StoreError>

pub fn update_embeddings_batch( &self, updates: &[(String, Embedding)], ) -> Result<usize, StoreError>

pub fn needs_reindex(&self, path: &Path) -> Result<Option<i64>, StoreError>

pub fn delete_by_origin(&self, origin: &Path) -> Result<u32, StoreError>

pub fn upsert_chunks_and_calls( &self, chunks: &[(Chunk, Embedding)], source_mtime: Option<i64>, calls: &[(String, CallSite)], ) -> Result<usize, StoreError>

pub fn prune_missing( &self, existing_files: &HashSet<PathBuf>, ) -> Result<u32, StoreError>

pub fn count_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<(u64, u64), StoreError>

pub fn list_stale_files( &self, existing_files: &HashSet<PathBuf>, ) -> Result<StaleReport, StoreError>

pub fn check_origins_stale( &self, origins: &[&str], root: &Path, ) -> Result<HashSet<String>, StoreError>

pub fn get_by_content_hash(&self, hash: &str) -> Option<Embedding>

pub fn get_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

pub fn get_chunk_ids_and_embeddings_by_hashes( &self, hashes: &[&str], ) -> Result<Vec<(String, Embedding)>, StoreError>

pub fn chunk_count(&self) -> Result<u64, StoreError>

pub fn stats(&self) -> Result<IndexStats, StoreError>

pub fn get_chunks_by_origin( &self, origin: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

pub fn get_chunks_by_origins_batch( &self, origins: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

pub fn get_chunks_by_names_batch( &self, names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

pub fn get_chunk_with_embedding( &self, id: &str, ) -> Result<Option<(ChunkSummary, Embedding)>, StoreError>

pub fn get_chunks_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, ChunkSummary>, StoreError>

pub fn get_embeddings_by_ids( &self, ids: &[&str], ) -> Result<HashMap<String, Embedding>, StoreError>

pub fn search_by_names_batch( &self, names: &[&str], limit_per_name: usize, ) -> Result<HashMap<String, Vec<SearchResult>>, StoreError>

pub fn all_chunk_identities(&self) -> Result<Vec<ChunkIdentity>, StoreError>

pub fn chunks_paged( &self, after_rowid: i64, limit: usize, ) -> Result<(Vec<ChunkSummary>, i64), StoreError>

pub fn all_chunk_identities_filtered( &self, language: Option<&str>, ) -> Result<Vec<ChunkIdentity>, StoreError>

pub fn embedding_batches( &self, batch_size: usize, ) -> impl Iterator<Item = Result<Vec<(String, Embedding)>, StoreError>> + '_

§Arguments

§Returns

§Panics

impl Store

pub fn upsert_notes_batch( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

pub fn search_notes( &self, query: &Embedding, limit: usize, threshold: f32, ) -> Result<Vec<NoteSearchResult>, StoreError>

pub fn replace_notes_for_file( &self, notes: &[(Note, Embedding)], source_file: &Path, file_mtime: i64, ) -> Result<usize, StoreError>

pub fn notes_need_reindex( &self, source_file: &Path, ) -> Result<Option<i64>, StoreError>

pub fn note_count(&self) -> Result<u64, StoreError>

pub fn note_stats(&self) -> Result<NoteStats, StoreError>

pub fn list_notes_summaries(&self) -> Result<Vec<NoteSummary>, StoreError>

pub fn note_embeddings(&self) -> Result<Vec<(String, Embedding)>, StoreError>

impl Store

pub fn upsert_type_edges( &self, chunk_id: &str, type_refs: &[TypeRef], ) -> Result<(), StoreError>

pub fn upsert_type_edges_for_file( &self, file: &Path, chunk_type_refs: &[ChunkTypeRefs], ) -> Result<(), StoreError>

pub fn get_type_users( &self, type_name: &str, ) -> Result<Vec<ChunkSummary>, StoreError>

pub fn get_types_used_by( &self, chunk_name: &str, ) -> Result<Vec<TypeUsage>, StoreError>

pub fn get_type_users_batch( &self, type_names: &[&str], ) -> Result<HashMap<String, Vec<ChunkSummary>>, StoreError>

pub fn get_types_used_by_batch( &self, chunk_names: &[&str], ) -> Result<HashMap<String, Vec<(String, String)>>, StoreError>

pub fn type_edge_stats(&self) -> Result<TypeEdgeStats, StoreError>

pub fn get_type_graph(&self) -> Result<TypeGraph, StoreError>

pub fn find_shared_type_users( &self, target_type: &str, limit: usize, ) -> Result<Vec<(String, u32)>, StoreError>

pub fn prune_stale_type_edges(&self) -> Result<u64, StoreError>

impl Store

pub fn open(path: &Path) -> Result<Self, StoreError>

pub fn open_readonly(path: &Path) -> Result<Self, StoreError>

Struct Store

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,