1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
//! Content-addressed BM25 + HNSW indexes on top of triblespace
//! piles. See `docs/DESIGN.md` for the full design rationale.
//!
//! Two canonical blob types, loaded zero-copy via [`anybytes`]
//! with bit-packed bodies via [`jerky`]:
//! - [`succinct::SuccinctBM25Index`] (schema
//! [`succinct::SuccinctBM25Blob`]) — term → doc retrieval
//! where terms are 32-byte triblespace `Inline`s (text tokens,
//! entity ids, tags, anything).
//! - [`succinct::SuccinctHNSWIndex`] (schema
//! [`succinct::SuccinctHNSWBlob`]) — approximate
//! k-nearest-neighbour over caller-supplied embeddings.
//!
//! [`bm25::BM25Builder::build`] goes direct-to-succinct
//! (sorts keys into a `CompressedUniverse` first, then
//! accumulates per-term postings in universe-code order — no
//! remap pass). [`hnsw::HNSWBuilder::build`] also returns the
//! succinct form directly (delegating through today's
//! `SuccinctHNSWIndex::from_naive` internally — the naive
//! intermediate is a necessary buffer because HNSW levels are
//! only revealed incrementally). Naive reference
//! implementations live under [`testing`] — see
//! [`testing::BM25Index`], [`testing::HNSWIndex`], and
//! [`testing::FlatIndex`] for oracles + benchmarks. Reach them
//! via `BM25Builder::build_naive()` / `HNSWBuilder::build_naive()`
//! / `FlatBuilder::build()`.
//!
//! Both indexes are rebuilt-and-replaced (no mutation); the
//! caller persists the resulting handle wherever appropriate
//! (branch metadata, commit metadata, a plain trible, or an
//! in-memory cache).
//!
//! # Query surface
//!
//! Two constraint shapes plug into `find!` / `and!` /
//! `pattern!`. Both follow the same rule: scoring is *not* a
//! bound variable. The constraint filters on a fixed
//! `score_floor` parameter; callers recompute the precise
//! score afterwards if they need it for ranking.
//!
//! - [`BM25Index::matches`][m] — multi-term BM25 filter.
//! Binds `doc` to documents whose summed BM25 score across
//! the query terms is `>= score_floor`. Pass `0.0` for
//! "any matching doc". Same method on [`SuccinctBM25Index`][sbm25].
//! Pair with [`BM25Index::score`][s] for ranking.
//! - [`AttachedHNSWIndex::similar`][sh] — symmetric binary
//! similarity relation over two
//! [`EmbHandle`][emb]-typed variables with a fixed cosine
//! threshold. Same method on
//! [`AttachedFlatIndex`][sf] and
//! [`AttachedSuccinctHNSWIndex`][ssh].
//! - [`AttachedHNSWIndex::similar_to`][sth] — unary
//! convenience for the common "search from a known handle"
//! case; pins the probe on the call.
//!
//! [m]: bm25::BM25Index::matches
//! [s]: bm25::BM25Index::score
//! [sbm25]: succinct::SuccinctBM25Index
//! [sh]: hnsw::AttachedHNSWIndex::similar
//! [sth]: hnsw::AttachedHNSWIndex::similar_to
//! [sf]: hnsw::AttachedFlatIndex::similar
//! [ssh]: succinct::AttachedSuccinctHNSWIndex::similar
//! [emb]: schemas::EmbHandle
//!
//! # Quickstart
//!
//! ```
//! use triblespace_core::find;
//! use triblespace_core::id::Id;
//!
//! use triblespace_search::bm25::BM25Builder;
//! use triblespace_search::succinct::SuccinctBM25Index;
//! use triblespace_search::tokens::hash_tokens;
//!
//! // 1. Build an in-memory index.
//! let mut b: BM25Builder = BM25Builder::new();
//! b.insert(Id::new([1; 16]).unwrap(), hash_tokens("the quick brown fox"));
//! b.insert(Id::new([2; 16]).unwrap(), hash_tokens("the lazy brown dog"));
//! b.insert(Id::new([3; 16]).unwrap(), hash_tokens("quick silver fox"));
//!
//! // 2. Build a succinct BM25 index in a single pass.
//! let idx: SuccinctBM25Index = b.build();
//!
//! // 3. Filter through the engine — constraint binds `doc`
//! // only; `score_floor = 0.0` means "any matching doc".
//! let terms = hash_tokens("fox");
//! let docs: Vec<(Id,)> = find!(
//! (doc: Id),
//! idx.matches(doc, &terms, 0.0)
//! ).collect();
//! assert_eq!(docs.len(), 2);
//! ```
//!
//! See the `examples/` directory for runnable walkthroughs:
//! `compose_bm25_and_pattern` / `multi_term_bm25_search`
//! (BM25 + pattern joins), `compose_hnsw_and_pattern`
//! (vector similarity + pattern), `hybrid_search` (all
//! three composed in one `find!`), and `phrase_search` for
//! the typed-tokenizer pattern.
//!
//! [`jerky`]: https://docs.rs/jerky
/// Reference implementations for tests and benchmarks.
///
/// The types re-exported here are naive (insertion-order,
/// non-packed) forms that exist only to validate the succinct
/// builds and to measure "how much does jerky packing actually
/// save at this scale." They are not a production persistence
/// path — persistence always goes through the succinct forms
/// in [`succinct`].
///
/// - [`BM25Index`][testing::BM25Index] — reference BM25 scoring
/// and query implementation. Produced by
/// [`bm25::BM25Builder::build_naive`].
/// - [`HNSWIndex`][testing::HNSWIndex] — node-major HNSW graph
/// with inline neighbour lists. Produced by
/// [`hnsw::HNSWBuilder::build_naive`]; also the input to
/// [`succinct::SuccinctHNSWIndex::from_naive`] for callers
/// who want to hold the naive form.
/// - [`FlatIndex`][testing::FlatIndex] /
/// [`FlatBuilder`][testing::FlatBuilder] — brute-force exact
/// k-NN baseline, used as HNSW's recall oracle.
// Versioning policy: breaking byte-layout changes mint a new
// `BlobEncoding` id (see `SuccinctBM25Blob` / `SuccinctHNSWBlob`
// in `succinct.rs`). The type system then rules out
// mismatched-layout deserialization — there's no single
// global version number. `git log docs/DESIGN.md` has the
// progression of layout decisions; the blob encoding id in
// `succinct.rs` is authoritative for what any given binary
// can load.