Expand description
Content-addressed BM25 + HNSW indexes on top of triblespace
piles. See docs/DESIGN.md for the full design rationale.
Two canonical blob types, loaded zero-copy via anybytes
with bit-packed bodies via jerky:
succinct::SuccinctBM25Index(schemasuccinct::SuccinctBM25Blob) — term → doc retrieval where terms are 32-byte triblespaceInlines (text tokens, entity ids, tags, anything).succinct::SuccinctHNSWIndex(schemasuccinct::SuccinctHNSWBlob) — approximate k-nearest-neighbour over caller-supplied embeddings.
bm25::BM25Builder::build goes direct-to-succinct
(sorts keys into a CompressedUniverse first, then
accumulates per-term postings in universe-code order — no
remap pass). hnsw::HNSWBuilder::build also returns the
succinct form directly (delegating through today’s
SuccinctHNSWIndex::from_naive internally — the naive
intermediate is a necessary buffer because HNSW levels are
only revealed incrementally). Naive reference
implementations live under testing — see
testing::BM25Index, testing::HNSWIndex, and
testing::FlatIndex for oracles + benchmarks. Reach them
via BM25Builder::build_naive() / HNSWBuilder::build_naive()
/ FlatBuilder::build().
Both indexes are rebuilt-and-replaced (no mutation); the caller persists the resulting handle wherever appropriate (branch metadata, commit metadata, a plain trible, or an in-memory cache).
§Query surface
Two constraint shapes plug into find! / and! /
pattern!. Both follow the same rule: scoring is not a
bound variable. The constraint filters on a fixed
score_floor parameter; callers recompute the precise
score afterwards if they need it for ranking.
BM25Index::matches— multi-term BM25 filter. Bindsdocto documents whose summed BM25 score across the query terms is>= score_floor. Pass0.0for “any matching doc”. Same method onSuccinctBM25Index. Pair withBM25Index::scorefor ranking.AttachedHNSWIndex::similar— symmetric binary similarity relation over twoEmbHandle-typed variables with a fixed cosine threshold. Same method onAttachedFlatIndexandAttachedSuccinctHNSWIndex.AttachedHNSWIndex::similar_to— unary convenience for the common “search from a known handle” case; pins the probe on the call.
§Quickstart
use triblespace_core::find;
use triblespace_core::id::Id;
use triblespace_search::bm25::BM25Builder;
use triblespace_search::succinct::SuccinctBM25Index;
use triblespace_search::tokens::hash_tokens;
// 1. Build an in-memory index.
let mut b: BM25Builder = BM25Builder::new();
b.insert(Id::new([1; 16]).unwrap(), hash_tokens("the quick brown fox"));
b.insert(Id::new([2; 16]).unwrap(), hash_tokens("the lazy brown dog"));
b.insert(Id::new([3; 16]).unwrap(), hash_tokens("quick silver fox"));
// 2. Build a succinct BM25 index in a single pass.
let idx: SuccinctBM25Index = b.build();
// 3. Filter through the engine — constraint binds `doc`
// only; `score_floor = 0.0` means "any matching doc".
let terms = hash_tokens("fox");
let docs: Vec<(Id,)> = find!(
(doc: Id),
idx.matches(doc, &terms, 0.0)
).collect();
assert_eq!(docs.len(), 2);See the examples/ directory for runnable walkthroughs:
compose_bm25_and_pattern / multi_term_bm25_search
(BM25 + pattern joins), compose_hnsw_and_pattern
(vector similarity + pattern), hybrid_search (all
three composed in one find!), and phrase_search for
the typed-tokenizer pattern.
Modules§
- bm25
- BM25-style lexical / associative retrieval.
- constraint
- Triblespace query-engine integration.
- hnsw
- Approximate nearest-neighbour search over caller-supplied embeddings.
- ring
- Fixed-predicate 2-ring for unlabeled graphs.
- schemas
- Inline and blob encodings minted for triblespace-search.
- succinct
- Jerky-backed succinct building blocks for the index blobs.
- testing
- Reference implementations for tests and benchmarks.
- tokens
- Opt-in helpers for turning strings into the 32-byte
triblespace
Inlines thatbm25::BM25Indexuses as term ids.