Skip to main content

Module posting_list

Module posting_list 

Source
Expand description

In-memory inverted index for FTS — term -> { rowid -> term_freq }, plus per-document length cache. Wraps the super::tokenizer + super::bm25 primitives into a usable index. Pure data structure; no SQL coupling.

Mirrors the role of crate::sql::hnsw::HnswIndex in 7d.1: this is the in-memory state that 8b will hang off Table (via a future fts_indexes: Vec<FtsIndex> field) and that 8c will serialize into KIND_FTS_POSTING cells.

§Identity choices

  • Rowids are i64 (matches HNSW’s node_id and SQLRite’s row-id convention; see crate::sql::hnsw::HnswIndex::insert).
  • The map structure is BTreeMap<String, BTreeMap<i64, u32>> rather than HashMap so that (1) persistence (8c) gets a deterministic on-disk byte order for free — postings are emitted in lexicographic term order, each posting list in ascending rowid order — and (2) tests get stable ordering without sorting. HashMap is faster on a per-op basis but the lookups in the FTS hot path are bounded by query-term count (single digits in practice), so the BTreeMap log-N factor is negligible.

§What it does NOT do (yet)

  • No persistence. State lives entirely in memory. 8c wires it into the page format under cell-kind 0x06.
  • No transaction integration. 8b is responsible for batching updates inside a BEGIN; ... COMMIT; block.
  • No phrase / boolean queries. Single-token any-term match only for the MVP per the plan’s “Out of scope” section. Multi-token queries OR the per-term hits — no AND, NOT, or positional info.

Structs§

PostingList
In-memory inverted index. See module-level doc.