pub struct RipvecIndex { /* private fields */ }Expand description
Combined orchestrator for the ripvec retrieval pipeline.
Constructed via RipvecIndex::from_root which walks files,
chunks them with ripvec’s chunker, embeds with the static encoder,
and builds the BM25 index.
Implementations§
Source§impl RipvecIndex
impl RipvecIndex
Sourcepub fn from_root(
root: &Path,
encoder: StaticEncoder,
cfg: &SearchConfig,
profiler: &Profiler,
pagerank_lookup: Option<HashMap<String, f32>>,
pagerank_alpha: f32,
) -> Result<Self>
pub fn from_root( root: &Path, encoder: StaticEncoder, cfg: &SearchConfig, profiler: &Profiler, pagerank_lookup: Option<HashMap<String, f32>>, pagerank_alpha: f32, ) -> Result<Self>
Build a RipvecIndex by walking root and indexing every
supported file. Uses encoder.embed_root (ripvec’s chunker +
model2vec encode) and builds a fresh BM25 index over the
resulting chunks.
pagerank_lookup is the optional structural-prior map (file
path → normalized PageRank) used by the final ranking layer;
pass None to disable. pagerank_alpha is the corresponding
boost strength.
§Errors
Returns the underlying error if embed_root fails.
Sourcepub fn apply_diff(&self, diff: &Diff, profiler: &Profiler) -> Result<Self>
pub fn apply_diff(&self, diff: &Diff, profiler: &Profiler) -> Result<Self>
Build a new index by incrementally applying diff against
self.
The selective-rebuild path that v3.1.0 punted on. Re-embeds only the dirty + new files, splices them into the existing chunks/embeddings, drops deleted files’ chunks, rebuilds BM25 and the per-file/per-language mappings from the new chunk set, reclassifies the corpus, and refreshes the manifest entries for the affected files.
§Cost shape
Roughly O(|diff.dirty| + |diff.new|) chunk + embed work plus
O(|self.chunks|) BM25 rebuild. On a 5000-chunk corpus with
one file changed: ~5-10 ms (embed one file) + ~50 ms (BM25
rebuild) = ~60 ms — vs. ~270 ms-1 s for a full
Self::from_root rebuild. The full-build cost is paid only
at cold start.
§BM25
BM25 is rebuilt from scratch over the new chunks vec rather than incrementally updated. Inverted-postings incremental update is correct but adds significant code; full rebuild at our chunk counts is fast enough that the simpler path wins.
§Errors
Returns the underlying error if StaticEncoder::embed_paths
fails or if the embedding matrix shape is invalid.
Sourcepub fn diff_against_filesystem(&self) -> Diff
pub fn diff_against_filesystem(&self) -> Diff
Compare the manifest captured at build time against the current
filesystem state under Self::root, using the same
WalkOptions used for the original index build.
Returns a Diff enumerating dirty, new, and deleted files.
A zero-cost (Diff::is_empty) result means the index is
up-to-date and no rebuild is needed.
§Cost
Walk + per-file stat() for the cheap-path files (typically all
of them between successive queries). Blake3 verification is paid
only on the rare files where the stat tuple mismatches. On a
200-file repo with no changes: sub-millisecond. On a 92k-file
repo with no changes: ~100-130 ms (the walk dominates).
§Mutation
This method takes &self and works on a clone of the manifest,
so the optimization of “refresh touched-but-unchanged stat
tuples” from diff_against_walk is discarded here. In
practice that means a file repeatedly touched without content
change pays one blake3 read per reconcile rather than zero —
negligible at our file sizes.
Sourcepub fn walk_options(&self) -> &WalkOptions
pub fn walk_options(&self) -> &WalkOptions
Walk options captured at build time.
Sourcepub fn corpus_class(&self) -> CorpusClass
pub fn corpus_class(&self) -> CorpusClass
The index’s corpus classification, computed at build time.
Used by the MCP rerank gate to decide whether the L-12 cross-encoder fires on a given query.
Sourcepub fn embeddings(&self) -> &Array2<f32>
pub fn embeddings(&self) -> &Array2<f32>
Indexed embeddings (read-only access).
Array2<f32> of shape [n_chunks, hidden_dim], row-major. Row
i is the L2-normalized embedding of chunk i, so cosine
similarity reduces to a dot product. Callers that need their
own similarity arithmetic (find_similar, find_duplicates)
should use embeddings.row(i) for a single-row view or
embeddings.dot(&query) for a one-call BLAS GEMV.
Sourcepub fn search(
&self,
query: &str,
top_k: usize,
mode: SearchMode,
alpha: Option<f32>,
filter_languages: Option<&[String]>,
filter_paths: Option<&[String]>,
) -> Vec<(usize, f32)>
pub fn search( &self, query: &str, top_k: usize, mode: SearchMode, alpha: Option<f32>, filter_languages: Option<&[String]>, filter_paths: Option<&[String]>, ) -> Vec<(usize, f32)>
Search the index and return ranked (chunk_index, score) pairs.
mode = SearchMode::Hybrid (default) fuses semantic + BM25 via
RRF; Semantic and Keyword use one signal each.
filter_languages and filter_paths build a selector mask
that restricts retrieval to chunks in the named files /
languages.
Trait Implementations§
Source§impl SearchableIndex for RipvecIndex
impl SearchableIndex for RipvecIndex
Source§fn search(
&self,
query_text: &str,
top_k: usize,
mode: SearchMode,
) -> Vec<(usize, f32)>
fn search( &self, query_text: &str, top_k: usize, mode: SearchMode, ) -> Vec<(usize, f32)>
Trait-shape search: text-only, no engine-specific knobs.
The trait surface is the LSP-callers’ common ground. Filters (language, path) and the alpha auto-detect override are not surfaced through the trait because no LSP module uses them.
Source§fn search_from_chunk(
&self,
chunk_idx: usize,
query_text: &str,
top_k: usize,
mode: SearchMode,
) -> Vec<(usize, f32)>
fn search_from_chunk( &self, chunk_idx: usize, query_text: &str, top_k: usize, mode: SearchMode, ) -> Vec<(usize, f32)>
Use chunk chunk_idx’s own embedding as the query vector and
rank everything else by cosine similarity (semantic-only) or
blend with BM25 (hybrid). Falls back to text-only keyword
search when the chunk index is out of range.
Mirrors the [HybridIndex] equivalent so goto_definition
and goto_implementation work identically across engines.
Auto Trait Implementations§
impl !Freeze for RipvecIndex
impl !RefUnwindSafe for RipvecIndex
impl Send for RipvecIndex
impl Sync for RipvecIndex
impl Unpin for RipvecIndex
impl UnsafeUnpin for RipvecIndex
impl UnwindSafe for RipvecIndex
Blanket Implementations§
Source§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
Source§type ArchivedMetadata = ()
type ArchivedMetadata = ()
Source§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be
downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further
downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
Source§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
Source§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
Source§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
Source§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out indicating that a T is niched.