pub struct Bm25Index { /* private fields */ }Expand description
Hand-rolled Okapi BM25 index over a set of enriched documents.
Built once via Bm25Index::build; queried repeatedly via
Bm25Index::score. Document order matches the chunk-index
convention used elsewhere in the ripvec port.
Implementations§
Source§impl Bm25Index
impl Bm25Index
Sourcepub fn build(chunks: &[CodeChunk]) -> Self
pub fn build(chunks: &[CodeChunk]) -> Self
Build an index over enriched chunks. Tokenization uses
crate::encoder::ripvec::tokens::tokenize.
Three-pass build:
- par_iter (tokenize + intern + TF): each chunk is enriched,
tokenized, and its tokens interned into a shared
ThreadedRodeo. The per-doc TF map keys on theSpurID instead ofString, eliminating the duplicated-string storage that dominated memory + hashing in the previous version. - serial DF merge: walk per-doc TF maps and increment a
global
Spur-keyed counter. WithSpurkeys (4-byteNonZeroU32), FxHash lookups are a single multiply. - serial IDF compute: produce the final df_idf map.
On a 92K-file linux corpus (~250K chunks): bm25_build drops from 35s serial → ~14s parallel without interning → ~7s with interning.
Sourcepub fn score(&self, query: &str) -> Vec<f32>
pub fn score(&self, query: &str) -> Vec<f32>
Compute BM25 scores for query against every document.
Returns a Vec<f32> of length self.len() (one score per doc).
Zero scores indicate no query terms matched.
Postings-list scoring: walks postings[term] for each query
term (typically <1% of corpus). Per-term work is dispatched via
rayon: each thread accumulates a local scores vector, all
vectors fold-reduce at the end. Parallelism is bounded by the
number of distinct query terms; for the common 1-5-term query
rayon uses 1-5 workers, which is appropriate — the algorithmic
win from inversion dwarfs any further parallel scaling.
Auto Trait Implementations§
impl !Freeze for Bm25Index
impl !RefUnwindSafe for Bm25Index
impl Send for Bm25Index
impl Sync for Bm25Index
impl Unpin for Bm25Index
impl UnsafeUnpin for Bm25Index
impl UnwindSafe for Bm25Index
Blanket Implementations§
Source§impl<T> ArchivePointee for T
impl<T> ArchivePointee for T
Source§type ArchivedMetadata = ()
type ArchivedMetadata = ()
Source§fn pointer_metadata(
_: &<T as ArchivePointee>::ArchivedMetadata,
) -> <T as Pointee>::Metadata
fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be
downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further
downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> LayoutRaw for T
impl<T> LayoutRaw for T
Source§fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>
Source§impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
Source§unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool
Source§fn resolve_niched(out: Place<NichedOption<T, N1>>)
fn resolve_niched(out: Place<NichedOption<T, N1>>)
out indicating that a T is niched.