lexir
Lexical IR on top of postings lists.
Status: experimental. This repository is public as a reference implementation; it is not
currently packaged for crates.io.
Feature Selection
default: Includespersistence.- In-memory only: disable default features.
What it is
lexir is the scoring/ranking layer. Candidate generation and storage live in postings.
Building
lexir is not on crates.io yet; use a git dependency:
[]
= { = "https://github.com/arclabs561/lexir" }
Notes:
postings,rankfns, anddurabilityare pulled as git dependencies.gramdexandtextprepare pulled fromcrates.io(and are only used when their features are enabled).
Usage (library)
BM25 (default):
use ;
let mut idx = new;
idx.add_document;
let hits = idx.retrieve.unwrap;
assert_eq!;
TF-IDF (requires multiple docs for non-zero IDF):
use ;
use InvertedIndex;
let mut idx = new;
idx.add_document;
idx.add_document; // IDF(hello) > 0
let hits = retrieve_tfidf.unwrap;
assert_eq!;
Query Likelihood (Dirichlet (\mu=1000) via QueryLikelihoodParams::default()):
use ;
use InvertedIndex;
let mut idx = new;
idx.add_document;
let hits = retrieve_query_likelihood.unwrap;
assert!;
Features
persistence(default): save/load viadurability+postings/persistencerecordlog: append-only operation logs for rebuildable indexes (CLI uses this)cli: enables thelexirCLI (debugging + end-to-end validation)fuzzy: fuzzy query expansion viagramdex— expands only OOV terms (terms not in the index); in-vocabulary terms are used as-is
CLI (with --features cli)
Subcommands: index, search-index, search for indexing and search. Record-log operations: log-add, log-delete, log-search, log-checkpoint, log-compact, log-status, log-doctor, log-prune, log-scan, log-validate, log-serve.