pub struct Bm25Index {
pub doc_count: usize,
/* private fields */
}Expand description
BM25 index for scoring query-document relevance.
Maintains document frequency statistics incrementally. Documents are identified by string IDs and can be added/removed dynamically.
Supports serialization via serialize()/deserialize() for persistence
across restarts, avoiding the need to rebuild from all stored memories.
Fields§
§doc_count: usizeTotal number of documents
Implementations§
Source§impl Bm25Index
impl Bm25Index
Sourcepub fn add_document(&mut self, id: &str, content: &str)
pub fn add_document(&mut self, id: &str, content: &str)
Add a document to the index, updating term frequencies and statistics.
If a document with the same ID already exists, it is replaced.
When the index exceeds max_documents, the oldest document is evicted.
Sourcepub fn remove_document(&mut self, id: &str)
pub fn remove_document(&mut self, id: &str)
Remove a document from the index, updating all statistics.
Sourcepub fn score(&self, query: &str, doc_id: &str) -> f64
pub fn score(&self, query: &str, doc_id: &str) -> f64
Score a query against a specific document using BM25.
The score is computed as:
score(q, d) = Σ IDF(qi) * (f(qi,d) * (k1+1)) / (f(qi,d) + k1*(1 - b + b*|d|/avgdl))
IDF(qi) = ln((N - n(qi) + 0.5) / (n(qi) + 0.5) + 1)The returned score is normalized to [0, 1] by dividing by the maximum possible score (perfect self-match with all query terms).
Sourcepub fn score_with_tokens_str(&self, query_tokens: &[&str], doc_id: &str) -> f64
pub fn score_with_tokens_str(&self, query_tokens: &[&str], doc_id: &str) -> f64
Score pre-tokenized query tokens (as &str slices) against a specific indexed document.
Use this when scoring multiple documents against the same query to avoid
re-tokenizing the query each time. Zero-allocation: passes slices directly
to the generic internal helpers.
Sourcepub fn score_text_with_tokens(
&self,
query_tokens: &[String],
document: &str,
) -> f64
pub fn score_text_with_tokens( &self, query_tokens: &[String], document: &str, ) -> f64
Score pre-tokenized query tokens against arbitrary text (not necessarily in the index). Use this when scoring multiple documents against the same query to avoid re-tokenizing the query each time.
Sourcepub fn score_text_with_tokens_str(
&self,
query_tokens: &[&str],
document: &str,
) -> f64
pub fn score_text_with_tokens_str( &self, query_tokens: &[&str], document: &str, ) -> f64
Score pre-tokenized query tokens (as &str slices) against arbitrary text.
Use this when scoring multiple documents against the same query to avoid
re-tokenizing the query each time. Zero-allocation: generic helpers accept &[&str].
Sourcepub fn score_text(&self, query: &str, document: &str) -> f64
pub fn score_text(&self, query: &str, document: &str) -> f64
Score a query against arbitrary text (not necessarily in the index). Useful for scoring documents that haven’t been indexed yet.
Source§impl Bm25Index
impl Bm25Index
Sourcepub fn serialize(&self) -> Vec<u8> ⓘ
pub fn serialize(&self) -> Vec<u8> ⓘ
Serialize the BM25 index to a byte vector for persistence.
Uses bincode for compact binary representation. The serialized format includes all document frequency statistics, term frequencies, and parameters, enabling fast startup without re-indexing all memories.
Sourcepub fn deserialize(data: &[u8]) -> Result<Self, String>
pub fn deserialize(data: &[u8]) -> Result<Self, String>
Deserialize a BM25 index from bytes previously produced by serialize().
Returns Err if the data is corrupt or from an incompatible version.
Reconstructs total_doc_len and insertion_order if they were missing
from older serialized data (via #[serde(default)]).
Sourcepub fn needs_save(&self) -> bool
pub fn needs_save(&self) -> bool
Whether the index contains any documents and may need saving.
Useful for batch operations: call persist_memory_no_save() in a loop,
then check needs_save() before writing the vector index to disk once
at the end. This avoids O(N) disk writes during bulk inserts.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for Bm25Index
impl<'de> Deserialize<'de> for Bm25Index
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for Bm25Index
impl RefUnwindSafe for Bm25Index
impl Send for Bm25Index
impl Sync for Bm25Index
impl Unpin for Bm25Index
impl UnsafeUnpin for Bm25Index
impl UnwindSafe for Bm25Index
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more