Skip to main content

Module fts

Module fts 

Source
Expand description

Full-text search (FTS) — inverted-index keyword retrieval with BM25 ranking. Pure algorithms; no SQL integration in this module.

Phase 8 of the SQLRite roadmap; see docs/phase-8-plan.md.

  • tokenizer — split text into terms (ASCII MVP per Q3).
  • bm25 — BM25 relevance scoring (k1 = 1.5, b = 0.75, fixed per Q4 + Q5; no stemming, no stop list).
  • posting_list — in-memory inverted index keyed by term, holding per-document term frequencies + lengths. Insert / remove / query.

Phase 8a shipped these standalone algorithms; Phase 8b wires them into the SQL surface (CREATE INDEX … USING fts(<col>), fts_match, bm25_score, the try_fts_probe optimizer hook). Persistence of the posting lists themselves arrives with Phase 8c (KIND_FTS_POSTING cell encoding).

Re-exports§

pub use bm25::Bm25Params;
pub use bm25::score as bm25_score;
pub use posting_list::PostingList;
pub use tokenizer::tokenize;

Modules§

bm25
BM25 relevance scoring — the standard ranking function for keyword retrieval. Pure math; no SQL coupling.
posting_list
In-memory inverted index for FTS — term -> { rowid -> term_freq }, plus per-document length cache. Wraps the super::tokenizer + super::bm25 primitives into a usable index. Pure data structure; no SQL coupling.
tokenizer
ASCII tokenizer for FTS — splits on [^A-Za-z0-9]+ and lowercases.