pub struct SemanticIndex { /* private fields */ }Expand description
Three-index semantic search over the stage store.
Implementations§
Source§impl SemanticIndex
impl SemanticIndex
Sourcepub fn from_stages(
stages: Vec<Stage>,
provider: Box<dyn EmbeddingProvider>,
config: IndexConfig,
) -> Result<Self, EmbeddingError>
pub fn from_stages( stages: Vec<Stage>, provider: Box<dyn EmbeddingProvider>, config: IndexConfig, ) -> Result<Self, EmbeddingError>
Build the index from an owned list of stages (useful in async contexts
where holding a &dyn StageStore across .await is not possible).
Sourcepub fn build(
store: &dyn StageStore,
provider: Box<dyn EmbeddingProvider>,
config: IndexConfig,
) -> Result<Self, EmbeddingError>
pub fn build( store: &dyn StageStore, provider: Box<dyn EmbeddingProvider>, config: IndexConfig, ) -> Result<Self, EmbeddingError>
Build the index from all non-tombstoned stages in a store.
Sourcepub fn from_stages_batched(
stages: Vec<Stage>,
cached_provider: CachedEmbeddingProvider,
config: IndexConfig,
chunk_size: usize,
) -> Result<Self, EmbeddingError>
pub fn from_stages_batched( stages: Vec<Stage>, cached_provider: CachedEmbeddingProvider, config: IndexConfig, chunk_size: usize, ) -> Result<Self, EmbeddingError>
Build the index in a single pass: collect every signature/description/
example text upfront, dispatch all cache misses through
inner.embed_batch in chunks of chunk_size, then assemble the three
sub-indexes. Used by noether-cloud’s registry on cold start so that
486 stages × 3 texts = 1458 individual API calls collapse into ~46
batch calls of 32 texts each — well within typical rate limits.
Sourcepub fn from_stages_batched_paced(
stages: Vec<Stage>,
cached_provider: CachedEmbeddingProvider,
config: IndexConfig,
chunk_size: usize,
inter_batch_delay: Duration,
) -> Result<Self, EmbeddingError>
pub fn from_stages_batched_paced( stages: Vec<Stage>, cached_provider: CachedEmbeddingProvider, config: IndexConfig, chunk_size: usize, inter_batch_delay: Duration, ) -> Result<Self, EmbeddingError>
Like from_stages_batched, but waits inter_batch_delay between
successive batch calls and commits cache entries to disk after each
batch. Use this with rate-limited remote providers (e.g. Mistral
free tier ≈ 1 req/s → pass ~1100 ms).
Sourcepub fn build_cached(
store: &dyn StageStore,
cached_provider: CachedEmbeddingProvider,
config: IndexConfig,
) -> Result<Self, EmbeddingError>
pub fn build_cached( store: &dyn StageStore, cached_provider: CachedEmbeddingProvider, config: IndexConfig, ) -> Result<Self, EmbeddingError>
Build using a CachedEmbeddingProvider for persistent embedding cache.
Sourcepub fn add_stage(&mut self, stage: &Stage) -> Result<(), EmbeddingError>
pub fn add_stage(&mut self, stage: &Stage) -> Result<(), EmbeddingError>
Add a single stage to all three indexes.
Sourcepub fn remove_stage(&mut self, stage_id: &StageId)
pub fn remove_stage(&mut self, stage_id: &StageId)
Remove a stage from all three indexes.
pub fn is_empty(&self) -> bool
Sourcepub fn search(
&self,
query: &str,
top_k: usize,
) -> Result<Vec<SearchResult>, EmbeddingError>
pub fn search( &self, query: &str, top_k: usize, ) -> Result<Vec<SearchResult>, EmbeddingError>
Search across all three indexes and return ranked results.
Sourcepub fn search_filtered(
&self,
query: &str,
top_k: usize,
tag: Option<&str>,
) -> Result<Vec<SearchResult>, EmbeddingError>
pub fn search_filtered( &self, query: &str, top_k: usize, tag: Option<&str>, ) -> Result<Vec<SearchResult>, EmbeddingError>
Like search, but restricts candidates to stages carrying tag (exact match).
Passing tag: None is equivalent to search.
Sourcepub fn search_by_tag(&self, tag: &str) -> Vec<StageId>
pub fn search_by_tag(&self, tag: &str) -> Vec<StageId>
Return all stage IDs that carry tag (exact match).
Return the set of all known tags across indexed stages.
Sourcepub fn check_duplicate_before_insert(
&self,
description: &str,
threshold: f32,
) -> Result<Option<(StageId, f32)>, EmbeddingError>
pub fn check_duplicate_before_insert( &self, description: &str, threshold: f32, ) -> Result<Option<(StageId, f32)>, EmbeddingError>
Check whether a candidate description is a near-duplicate of an existing stage.
Returns Some((stage_id, similarity)) if any existing stage’s semantic embedding
exceeds threshold (default 0.92). Returns None if the description is novel enough.