pub struct TextIndex { /* private fields */ }Expand description
Text index for keyword search with pluggable tokenization and phrase support
Implementations§
Source§impl TextIndex
impl TextIndex
Sourcepub fn with_tokenizer(tokenizer: Box<dyn Tokenizer>) -> Self
pub fn with_tokenizer(tokenizer: Box<dyn Tokenizer>) -> Self
Create a new TextIndex with a custom tokenizer
Sourcepub fn tokenizer_name(&self) -> &'static str
pub fn tokenizer_name(&self) -> &'static str
Get the name of the current tokenizer
Sourcepub fn index_document(&mut self, id: Id, text: String)
pub fn index_document(&mut self, id: Id, text: String)
Index a document’s text
Sourcepub fn remove_document(&mut self, id: &str)
pub fn remove_document(&mut self, id: &str)
Remove a document from the index
Sourcepub fn bm25_scores(&self, query: &str) -> HashMap<Id, f32>
pub fn bm25_scores(&self, query: &str) -> HashMap<Id, f32>
Compute BM25 scores for query terms
BM25 formula: score(D, Q) = Σ IDF(qi) * (f(qi, D) * (k1 + 1)) / (f(qi, D) + k1 * (1 - b + b * |D| / avgdl))
where:
- IDF(qi) = log((N - df(qi) + 0.5) / (df(qi) + 0.5))
- f(qi, D) = frequency of qi in document D
- |D| = length of document D
- avgdl = average document length
- k1 = 1.2 (term frequency saturation)
- b = 0.75 (length normalization)
Sourcepub fn phrase_search(&self, phrase: &str) -> HashMap<Id, f32>
pub fn phrase_search(&self, phrase: &str) -> HashMap<Id, f32>
Search for an exact phrase and return matching document IDs with scores
Uses position information to verify that terms appear consecutively. Returns BM25 scores with a phrase boost for exact matches.
pub fn has_text(&self, id: &str) -> bool
pub fn get_text(&self, id: &str) -> Option<&str>
Sourcepub fn export_texts(&self) -> &HashMap<Id, String>
pub fn export_texts(&self) -> &HashMap<Id, String>
Export text data for persistence (Major Issue #6 fix)
Returns the texts HashMap which can be serialized and saved to disk. The inverted index, doc_lengths, and statistics are not exported since they can be rebuilt by re-tokenizing the texts on load.
Sourcepub fn import_texts(&mut self, texts: HashMap<Id, String>)
pub fn import_texts(&mut self, texts: HashMap<Id, String>)
Import text data from disk and rebuild the index (Major Issue #6 fix)
Takes a HashMap of texts loaded from disk and rebuilds the inverted index, doc_lengths, and statistics by re-tokenizing all texts. This allows the text index to survive store reopens and snapshots.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for TextIndex
impl !RefUnwindSafe for TextIndex
impl Send for TextIndex
impl Sync for TextIndex
impl Unpin for TextIndex
impl !UnwindSafe for TextIndex
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more