pub struct BM25Index { /* private fields */ }Expand description
BM25 index implementation
Implementations§
Source§impl BM25Index
impl BM25Index
Sourcepub fn to_compressed_bytes(&self, compression: Compression) -> Result<Vec<u8>>
pub fn to_compressed_bytes(&self, compression: Compression) -> Result<Vec<u8>>
Serialize to compressed bytes using specified compression
§Errors
Returns error if serialization or compression fails
Sourcepub fn from_compressed_bytes(
data: &[u8],
compression: Compression,
) -> Result<Self>
pub fn from_compressed_bytes( data: &[u8], compression: Compression, ) -> Result<Self>
Source§impl BM25Index
impl BM25Index
Sourcepub fn with_params(k1: f32, b: f32) -> Self
pub fn with_params(k1: f32, b: f32) -> Self
Create with custom BM25 parameters
Sourcepub fn with_tokenizer(self, tokenizer: Arc<dyn Tokenizer>) -> Self
pub fn with_tokenizer(self, tokenizer: Arc<dyn Tokenizer>) -> Self
HELIX-IDEA-005 Phase 4 (FALSIFY-HYBRID-003): plug a custom tokenizer into the index so BM25’s notion of “term” can be shared with other consumers (e.g., an inference path that uses the same lexicon).
When tokenizer is Some, the index’s internal tokenize()
delegates to it; the built-in lowercase / stopwords /
min-length rules are bypassed entirely. To revert to the
built-in path, construct a fresh BM25Index::new().
Arc<dyn Tokenizer> because the index is Clone and may
be shared across threads — Box<dyn Tokenizer> would force
each clone to deep-copy the tokenizer state.
Sourcepub fn has_custom_tokenizer(&self) -> bool
pub fn has_custom_tokenizer(&self) -> bool
True iff a custom tokenizer is plugged in (used by tests and FALSIFY-HYBRID-003 to confirm the override path is active).
Sourcepub fn indexed_terms(&self) -> Vec<&str>
pub fn indexed_terms(&self) -> Vec<&str>
All terms currently indexed (i.e., the keys of the
inverted index). Used by FALSIFY-HYBRID-003 to verify the
indexer consulted the injected tokenizer during add() —
the built-in tokenizer and an injected one produce
observably different key sets on the same content.
Sourcepub fn with_stopwords(self, stopwords: HashSet<String>) -> Self
pub fn with_stopwords(self, stopwords: HashSet<String>) -> Self
Set stopwords
Trait Implementations§
Source§impl<'de> Deserialize<'de> for BM25Index
impl<'de> Deserialize<'de> for BM25Index
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl SparseIndex for BM25Index
impl SparseIndex for BM25Index
Auto Trait Implementations§
impl Freeze for BM25Index
impl !RefUnwindSafe for BM25Index
impl Send for BM25Index
impl Sync for BM25Index
impl Unpin for BM25Index
impl UnsafeUnpin for BM25Index
impl !UnwindSafe for BM25Index
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<F, T> IntoSample<T> for Fwhere
T: FromSample<F>,
impl<F, T> IntoSample<T> for Fwhere
T: FromSample<F>,
fn into_sample(self) -> T
Source§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<R, P> ReadPrimitive<R> for P
impl<R, P> ReadPrimitive<R> for P
Source§fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
ReadEndian::read_from_little_endian().