pub struct PostingList { /* private fields */ }Expand description
In-memory inverted index. See module-level doc.
Implementations§
Source§impl PostingList
impl PostingList
Sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
True iff no document has been inserted (or all have been removed).
Sourcepub fn avg_doc_len(&self) -> f64
pub fn avg_doc_len(&self) -> f64
Average document length in tokens. Returns 0.0 when the index
is empty so BM25 can guard cleanly without a div-by-zero.
Sourcepub fn serialize_doc_lengths(&self) -> Vec<(i64, u32)>
pub fn serialize_doc_lengths(&self) -> Vec<(i64, u32)>
Phase 8c — emit (rowid, doc_len) pairs for every indexed doc,
in ascending rowid order. The pager writes these into the FTS
index’s doc-lengths sidecar cell; reload feeds them back to
Self::from_persisted_postings.
Sourcepub fn serialize_postings(&self) -> Vec<(String, Vec<(i64, u32)>)>
pub fn serialize_postings(&self) -> Vec<(String, Vec<(i64, u32)>)>
Phase 8c — emit (term, [(rowid, term_freq)]) triples in
lexicographic term order; per-term entries are in ascending
rowid order (the underlying BTreeMap already guarantees this).
One element per unique indexed term; pager writes one cell per
element.
Sourcepub fn from_persisted_postings<I, J>(doc_lengths: I, postings: J) -> Self
pub fn from_persisted_postings<I, J>(doc_lengths: I, postings: J) -> Self
Phase 8c — rebuild a PostingList directly from the persisted
doc-lengths sidecar + per-term postings. No tokenization runs;
the resulting index is byte-equivalent to what was saved
(assuming the input came from serialize_*).
doc_lengths is the full (rowid, doc_len) map written into
the sidecar cell. postings is one (term, [(rowid, tf)])
element per term cell.
Sourcepub fn insert(&mut self, rowid: i64, text: &str)
pub fn insert(&mut self, rowid: i64, text: &str)
Tokenize text and add its postings under rowid. If rowid is
already indexed, its previous postings are removed first — i.e.
insert is idempotent for re-indexing the same row.
A row whose tokenization yields zero tokens is still recorded
(with doc_len = 0 and no posting entries). This keeps len()
honest for “indexed but empty” rows; BM25 returns 0.0 for them.
Sourcepub fn remove(&mut self, rowid: i64)
pub fn remove(&mut self, rowid: i64)
Remove all postings for rowid. No-op if rowid was never
inserted. Empty per-term posting lists left behind by the last
referencing row are pruned to keep the BTreeMap tight.
Sourcepub fn matches(&self, rowid: i64, query: &str) -> bool
pub fn matches(&self, rowid: i64, query: &str) -> bool
True iff rowid is indexed and at least one of its terms is in
the (tokenized) query. Powers fts_match(col, 'q') in 8b
without going through scoring.
Sourcepub fn score(&self, rowid: i64, query: &str, params: &Bm25Params) -> f64
pub fn score(&self, rowid: i64, query: &str, params: &Bm25Params) -> f64
BM25 score for a single (rowid, query) pair. Returns 0.0 if
rowid is unknown or no query terms hit.
Sourcepub fn query(&self, query: &str, params: &Bm25Params) -> Vec<(i64, f64)>
pub fn query(&self, query: &str, params: &Bm25Params) -> Vec<(i64, f64)>
Score every doc that contains at least one query term and return
(rowid, score) sorted by score descending, ties broken by
rowid ascending. Powers the bulk path used by 8b’s
try_fts_probe optimizer hook.
Empty query → empty result. Empty index → empty result. Rows that don’t match any query term are not scored at all (they would score 0.0 — including them just bloats the result).
Trait Implementations§
Source§impl Clone for PostingList
impl Clone for PostingList
Source§fn clone(&self) -> PostingList
fn clone(&self) -> PostingList
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more