lexir
Lexical IR (BM25, TF‑IDF, Query Likelihood) on top of postings lists.
Status: experimental. This repository is public as a reference implementation; it is not
currently packaged for crates.io.
Feature Selection
default: Includespersistence.- In-memory only: disable default features.
What it is
lexir is the scoring/ranking layer. Candidate generation and storage live in postings.
Building
lexir is not on crates.io yet; use a git dependency:
[]
= { = "https://github.com/arclabs561/lexir" }
Notes:
postings,rankfns, anddurabilityare pulled as git dependencies.gramdexandtextprepare pulled fromcrates.io(and are only used when their features are enabled).
Usage (library)
BM25 (default):
use ;
let mut idx = new;
idx.add_document;
let hits = idx.retrieve.unwrap;
assert_eq!;
TF-IDF (requires multiple docs for non-zero IDF):
use ;
use InvertedIndex;
let mut idx = new;
idx.add_document;
idx.add_document; // IDF(hello) > 0
let hits = retrieve_tfidf.unwrap;
assert_eq!;
Query Likelihood (Dirichlet (\mu=1000) via QueryLikelihoodParams::default()):
use ;
use InvertedIndex;
let mut idx = new;
idx.add_document;
let hits = retrieve_query_likelihood.unwrap;
assert!;
Features
persistence(default): save/load viadurability+postings/persistencerecordlog: append-only operation logs for rebuildable indexes (CLI uses this)cli: enables thelexirCLI (debugging + end-to-end validation)fuzzy: fuzzy query expansion viagramdex— expands only OOV terms (terms not in the index); in-vocabulary terms are used as-is
CLI (with --features cli)
Indexing & search: index, search-index, search (build/search from corpus or saved index).
Record-log operations (append-only ops + checkpoint):
log-add,log-delete,log-search— incremental updates and search over loglog-checkpoint,log-compact,log-status— checkpoint managementlog-doctor --root <dir> [--fix]— repair missing meta fileslog-prune --root <dir>— prune redundant checkpointslog-scan --root <dir> [--strict]— validate record log integritylog-validate --root <dir>— verify checkpoint + log consistencylog-serve— serve search over a log-backed index