Expand description
Full-text search (FTS) — inverted-index keyword retrieval with BM25 ranking. Pure algorithms; no SQL integration in this module.
Phase 8 of the SQLRite roadmap; see docs/phase-8-plan.md.
tokenizer— split text into terms (ASCII MVP per Q3).bm25— BM25 relevance scoring (k1 = 1.5,b = 0.75, fixed per Q4 + Q5; no stemming, no stop list).posting_list— in-memory inverted index keyed by term, holding per-document term frequencies + lengths. Insert / remove / query.
Phase 8a shipped these standalone algorithms; Phase 8b wires them
into the SQL surface (CREATE INDEX … USING fts(<col>),
fts_match, bm25_score, the try_fts_probe optimizer hook).
Persistence of the posting lists themselves arrives with Phase 8c
(KIND_FTS_POSTING cell encoding).
Re-exports§
pub use bm25::Bm25Params;pub use bm25::score as bm25_score;pub use posting_list::PostingList;pub use tokenizer::tokenize;
Modules§
- bm25
- BM25 relevance scoring — the standard ranking function for keyword retrieval. Pure math; no SQL coupling.
- posting_
list - In-memory inverted index for FTS —
term -> { rowid -> term_freq }, plus per-document length cache. Wraps thesuper::tokenizer+super::bm25primitives into a usable index. Pure data structure; no SQL coupling. - tokenizer
- ASCII tokenizer for FTS — splits on
[^A-Za-z0-9]+and lowercases.