Trigram + bloom-filter text index.
dyntext is the algorithmic core of the dynomite text-search
surface. It ports the inverted-index pipeline from
pg_tre -- a
PostgreSQL access method for approximate-regex matching --
into pure Rust, so the same trigram + bloom funnel can sit
behind dynomite's Redis FT.* command surface.
Phase 1 scope
This crate currently implements:
- Three-byte n-gram extraction with padding so the input's
boundary bytes get full coverage (see [
trigram]). - A roaring-bitmap-backed inverted index keyed by trigram
hash (see [
postings::Postings]). - A standard bloom filter with configurable bit count and
hash count (see [
bloom::BloomFilter]). - A combined [
index::TextIndex] that ties the three together and serves exact-substring queries through the four-tier filter funnel from the design doc.
Phase 2 + 3 scope
Phase 2 adds regex-driven search on top of the existing exact-substring path:
- A small internal regex AST built from
[
regex_syntax]'s HIR (see [regex_ast]). - A required-trigram extractor that walks the AST and
computes the trigrams every matching string must contain
(see [
prefix_extract]). - [
index::TextIndex::search_regex], which uses the extractor to prune the postings lists before running the actual matcher (currently [regex::bytes::Regex]).
Phase 3 adds the approximate-regex recheck:
- Safe FFI wrapper around the TRE C library for
approximate-regex matching with up to k typos
(see [
tre]). The wrapper is the optional recheck step; the trigram + bloom funnel is reused unchanged. - Phase 4: Redis FT.SEARCH / FT.REGEX command parser integration on top of the dynvec fold.
Optional features
noxu-- enables the [persist] module that serialises a [TextIndex] to an embedded Noxu DB environment so the trigram postings, per-doc bloom filters, and raw text survive a process restart. The feature pulls innoxu-dbandbincodeas workspace path dependencies.
Quick start
use TextIndex;
let mut idx = new;
let id_a = idx.insert;
let id_b = idx.insert;
let id_c = idx.insert;
let hits = idx.search_substring;
assert!;
assert!;
assert!;