Expand description
In-memory inverted index for FTS — term -> { rowid -> term_freq },
plus per-document length cache. Wraps the super::tokenizer +
super::bm25 primitives into a usable index. Pure data structure;
no SQL coupling.
Mirrors the role of crate::sql::hnsw::HnswIndex in 7d.1: this is
the in-memory state that 8b will hang off Table (via a future
fts_indexes: Vec<FtsIndex> field) and that 8c will serialize into
KIND_FTS_POSTING cells.
§Identity choices
- Rowids are
i64(matches HNSW’snode_idand SQLRite’s row-id convention; seecrate::sql::hnsw::HnswIndex::insert). - The map structure is
BTreeMap<String, BTreeMap<i64, u32>>rather thanHashMapso that (1) persistence (8c) gets a deterministic on-disk byte order for free — postings are emitted in lexicographic term order, each posting list in ascending rowid order — and (2) tests get stable ordering without sorting.HashMapis faster on a per-op basis but the lookups in the FTS hot path are bounded by query-term count (single digits in practice), so the BTreeMap log-N factor is negligible.
§What it does NOT do (yet)
- No persistence. State lives entirely in memory. 8c wires it
into the page format under cell-kind
0x06. - No transaction integration. 8b is responsible for batching
updates inside a
BEGIN; ... COMMIT;block. - No phrase / boolean queries. Single-token any-term match only for the MVP per the plan’s “Out of scope” section. Multi-token queries OR the per-term hits — no AND, NOT, or positional info.
Structs§
- Posting
List - In-memory inverted index. See module-level doc.