# triblespace-search — design
Two content-addressed index blobs that sit on top of a triblespace
pile: one for BM25-style lexical / associative retrieval, one for
approximate nearest-neighbour search over embeddings. Both follow
the same invariants:
1. **Content-addressed.** Same corpus → same blob hash. Rebuilds
are free when nothing has changed; same content embedded with
the same model yields the same blob everywhere in the pile.
2. **Rebuild-and-replace, no mutation.** `build(corpus) -> Handle`
returns a fresh blob. The caller persists the handle wherever
it belongs (branch metadata, commit metadata, a plain trible).
Old index blobs stop being referenced on rebuild; squash
reclaims them eventually.
3. **Zero-copy views via jerky.** The blob is a self-contained
byte buffer; a `try_from_blob` produces a view that holds an
`anybytes::Bytes` backing and answers queries without copying.
4. **Unordered-query shape.** Both indexes expose their query
primitive as a triblespace constraint, and both follow the
same rule: doc/handle is the only bound variable, score is
a fixed parameter (no quantisation bookkeeping in the
engine path).
`bm25.matches(?doc, &terms, score_floor: f32)` — multi-
term BM25 filter. Binds `doc` to documents whose summed
BM25 across `terms` is `>= score_floor`. Recompute exact
scores via `idx.score(&doc, &terms)` for ranking.
`hnsw.similar(?a, ?b, score_floor: f32)` — symmetric
binary relation: `a` and `b` are
`Variable<Handle<Embedding>>`, `score_floor` is a
fixed cosine threshold. At least one of `a` / `b` must be
bound for the engine to walk.
Callers combine with `and!` / `or!` / filters in the normal
query engine; ordering is done in Rust after `.collect()`.
## Term is a `Inline`
BM25 in `triblespace-search` is not text-specific. Callers supply
terms as 32-byte `Inline`s; the library provides a
`hash_tokens(&str) -> Vec<Inline>` helper that Blake3-hashes
tokenized words but never forces it on the schema. Downstream uses:
| `hash(word)` | Classic text search. |
| entity `Id` | "Docs mentioning this person." |
| tag `Id` | Tag-weighted search. |
| `hash(n-gram)` | Phrase search via query rewrite. |
| fragment `Id` | "Docs citing this fragment." |
The BM25 index is therefore a general `(doc: Id, term: Inline, score)`
relation with IDF and length-normalized scoring baked in at build
time.
## `SuccinctBM25Index` — SB25 blob layout
Self-contained blob, zero-copy via `anybytes::Bytes`, bit-packed
via jerky. Schema id: `5A1EF3FFD638B15E3EBEAA1E92660441` (see
`succinct::SuccinctBM25Blob`). The typed `BlobEncoding` handle is
the identity — no magic bytes, no version field in the blob.
A breaking format change mints a new schema id.
```
[header ] 264 B (fixed)
avg_doc_len f32 ; for length normalization
k1 f32 ; BM25 tuning (default 1.5)
b f32 ; BM25 tuning (default 0.75)
max_score f32 ; u16-quantization scale
n_docs u64
n_terms u64
doc_lens_meta 32 B ; CompactVectorMetaOnDisk
postings_doc_idx_meta 32 B ; CompactVectorMetaOnDisk
postings_offsets_meta 32 B ; CompactVectorMetaOnDisk
postings_scores_meta 32 B ; CompactVectorMetaOnDisk
keys_meta 40 B ; CompressedUniverseMetaOnDisk
(section_offset, section_len) × 4 = 64 B
[keys ] variable ; CompressedUniverse view:
; 4-byte fragment dictionary
; (sorted, deduped) + DACs-byte
; codes, one per unique key.
; `keys.access(code)` decodes
; the 32-byte RawInline.
[terms ] n_terms × 32 B ; sorted RawInline table
[doc_lens ] variable ; jerky CompactVector body
; width = ceil(log2(max_len + 1))
; indexed by universe-code order
[postings ] variable ; three jerky CompactVectors in
; one ByteArea:
; doc_idx (width log2(n_docs+1),
; stores universe codes, not
; insertion indexes)
; offsets (width log2(total+1))
; scores (width 16, u16-quantized)
```
Every body section starts on an 8-byte boundary; the header's
264-byte length is already a multiple of 8, so no tail padding
is needed. `CompactVector` reinterprets its backing buffer as
`[u64]`, so misalignment would panic at load time.
The four `CompactVectorMetaOnDisk` structs are a
[`zerocopy::IntoBytes`]-deriveable mirror of jerky's
`CompactVectorMeta` (jerky's upstream derives only `FromBytes`).
Four `u64` fields, 32 bytes each, `#[repr(C)]`. The
`CompressedUniverseMetaOnDisk` is the same trick for
`triblespace::core::blob::encodings::succinctarchive::CompressedUniverseMeta`
— five `u64` fields, 40 bytes, `#[repr(C)]`. Static asserts in
`succinct.rs` lock the size and layout equivalence.
Lookup algorithm:
1. Binary-search the term in `terms` (typed
`View<[[u8; 32]]>` over the canonical bytes — slice's
`binary_search`) → term index *t*.
2. Read `(offsets[t], offsets[t+1])` from the postings offsets
CompactVector.
3. For each *i* in that range, read `doc_idx[i]` (a
`CompressedUniverse` code) from the postings doc_idx
CompactVector and `score[i]` from the quantized score
section; decode the external key via
`keys.access(doc_idx)`.
### What's already compressed (as of the current impl)
- `doc_lens` → bit-packed to `ceil(log2(max_len + 1))` bits.
At 100k docs with avg_doc_len ≈ 180 and max ≈ 1024, ~10 bits
instead of 32 — 3.2× savings on that section.
- `postings.doc_idx` → bit-packed to `ceil(log2(n_docs + 1))`.
At 100k docs, 17 bits instead of 32 — 1.9× savings.
- `postings.offsets` → bit-packed likewise.
- `postings.scores` → u16-quantized via global `max_score`
scale. Half-bucket error bound = `max_score / 2 × 65535`;
`score_tolerance` on the index returns the full bucket
`max_score / 65534` for constraint equality. 2× savings
vs. f32 on the score section, top-10 preservation verified
at 1k scale.
### What's still flat (deliberately)
- `terms` — 32 bytes each (Blake3 hash). We tried fragment-
dictionary compression here (phase 2a) and it **grew** the
section: Blake3 hashes have maximum entropy, so the 4-byte
fragment dictionary overhead exceeded any code-length win.
See `tests/scale_smoke.rs` and the phase 2a revert in git
history for the actual numbers.
### What keys-side compression bought us
- `keys` is now a `CompressedUniverse` (Phase 2b). Measured via
`cargo run --release --example blob_sizes_at_scale`:
| corpus | keys section vs 32 B flat |
| :--------------- | :------------------------: |
| scattered GenIds | 0.74×–0.81× |
| 11-byte-prefix | 0.29×–0.32× |
"Scattered" is the pseudo-random `id_from_u64` with 16 trailing
random bytes (worst-ish case — only the leading 16 zero bytes
are shared). "Correlated" shares an 11-byte prefix and varies
only the last 5 — simulates "one session of entity ids minted
from a shared namespace seed."
- Whole-blob ratio moves too, but modestly: 0.48×→0.42× at 1 k
docs with correlated keys; ~0.01×–0.02× improvement at 50 k
because postings dominate the denominator.
- The architectural win is type-level: `keys.access(code)` goes
through the same universe plumbing as every other `Inline`
table in the stack; range / prefix / membership queries over
the keys universe compose for free.
### Open compression directions
- **Delta-encoded posting doc_idx** — posting lists are now
universe-code-sorted (Phase 2b), so consecutive deltas
compress further via Simple16 / ELF / VByte. Roughly halves
the `doc_idx` section at Heaps-law corpora. This is the
next-biggest win to chase — postings dominate the blob.
- **Non-uniform score quantization** — current u16 quantization
uses a linear global `max_score` scale. A log-space or per-
term scale would preserve more precision in the high-df
(common-term) tail at the cost of a bigger header. Only
worth it if ranking drift bites at larger corpora.
- **Wavelet matrix on the term table** — would let rank/select
queries hit terms without a linear-compare binary search.
For identification-only lookups the current
`View<[[u8; 32]]>` slice-binary-search is competitive; the
wavelet matrix would unlock range queries over terms (useful
for n-gram prefix scans).
## `SuccinctHNSWIndex` — SH25 blob layout
Self-contained blob, zero-copy via `anybytes::Bytes`. Schema id:
`27D71A473EF22DA4D916F61810AC5D86` (see
`succinct::SuccinctHNSWBlob`). As with SB25, the typed handle
is the identity — no in-blob magic or version.
```
[header ] 144 B (fixed)
dim u32
m u16 ; max neighbours on non-zero layers
m0 u16 ; max neighbours on layer 0
max_level u8
reserved u8
has_entry_point u8
reserved u8
entry_point u32
n_nodes u64
n_layers u64
graph_neighbours_meta 32 B ; CompactVectorMetaOnDisk
graph_offsets_meta 32 B ; CompactVectorMetaOnDisk
(section_offset, section_len) × 3 = 48 B
[handles ] n_nodes × 32 B ; Inline<Handle<Embedding>>
; — the node IS the handle;
; no separate doc-key table.
[graph_bytes ] variable ; two CompactVectors in one
; ByteArea:
; neighbours (width log2(n+1))
; offsets (width log2(E+1))
```
Schema id: `A96890DE5F85A4F2285C365549B21BC2` (see
`succinct::SuccinctHNSWBlob`; rotated from
`27D71A473EF22DA4D916F61810AC5D86` when the keys section was
dropped).
`graph_bytes` packs neighbour lists across all `(layer, node)`
pairs into a flat CSR: `offsets[L·(n+1) + i]` gives the start of
node *i*'s neighbour list on layer *L* inside `neighbours`. Nodes
absent from layer *L* encode as empty slices — search walks stay
correct because an empty neighbour list is a dead end, and the
search always enters from the top-level entry point.
Query algorithm (standard Malkov-Yashunin search, threshold-gated):
1. Start at `entry_point` on `max_level`.
2. Greedy-descend layer-by-layer down to 1.
3. On layer 0, ef-width beam search; keep every candidate whose
cosine similarity clears `score_floor`.
The succinct path re-implements the greedy + ef-search against
the bit-packed graph; see
`AttachedSuccinctHNSWIndex::candidates_above` in
`src/succinct.rs`.
### What's already compressed
- Graph `neighbours` → `ceil(log2(n_nodes + 1))` bits per
neighbour index (17 bits at 100k nodes vs. 32 bits raw).
- Graph `offsets` → `ceil(log2(total_edges + 1))` bits per
offset, which for `M=16` / `M0=32` averages similar savings.
### What's still flat
- `doc_ids` — 16-byte natural size.
- `vectors` — raw f32. Caller-owned data; compression is the
caller's decision via their embedding schema choice (the
crate itself stays agnostic).
### Handle-keyed storage (shipped for both FlatIndex and HNSW)
Both `FlatIndex` and `SuccinctHNSWIndex` store a flat table of
`Inline<Handle<Embedding>>` (32 B per handle). There is
no separate "doc key" table — the node IS the handle. Callers
who want a book-id → embedding-handle mapping keep it as a
trible attribute they own (`book_embedding` in the examples),
not as shadow data inside the index. Embeddings live in the
pile's blob store, content-addressed, dedup'd across indexes:
```
pile blob store:
Handle<Embedding> h_a → blob_a (one copy of vector A)
Handle<Embedding> h_b → blob_b (one copy of vector B)
FlatIndex: [h_a, h_b, h_c, ...] ← 32 B per entry
SuccinctHNSWIndex: [h_a, h_b, h_c, ...] + graph ← 32 B per entry + bits
```
The `Embedding` blob encoding id is
`EEC5DFDEA2FFCED70850DF83B03CB62B` (minted via `trible genid`).
At query time the walk resolves each handle through
`BlobStoreGet`, and the `BlobCache` wrapper in
`triblespace::core::blob` collapses repeat visits into a
single fetch per view lifetime.
For 100 k × 384-dim MiniLM: the HNSW blob is handles + graph =
~3.2 MiB + bit-packed CSR (a few more MiB); embedding blobs
(~147 MiB total) are dedup'd across every index that references
them.
### Open compression directions
- **2-ring graph encoding** — built and benchmarked in
`src/ring.rs` + `examples/ring_vs_csr*.rs`. The fixed-
predicate sub-ring from Arroyuelo et al. *The Ring* (TODS
2024 §4.4) halves the graph blob vs CSR at every scale
tested. We *didn't* adopt it as the default because it
costs ~3× end-to-end query latency on in-memory / warm-
cache workloads, and at 1B corpus scale the graph is only
~4 % of total storage (embeddings dominate). `RingGraph`
stays as an opt-in primitive for disk-backed or
branch-metadata-heavy workloads. See
[`docs/HNSW_GRAPH_ENCODING.md`](HNSW_GRAPH_ENCODING.md)
for the full measurements and when-to-use-which.
- **Vector quantization** — the biggest lever at scale.
The caller owns the embedding schema; we could ship
`EmbeddingI8` / `EmbeddingPQ` alongside `Embedding` and
let the distance function branch on the schema. At
dim=384+ the embeddings are 90 %+ of total storage, so
4–16× quantization shrinks wins far more than any graph
encoding.
## Query engine integration
Both indexes expose their query as a `triblespace::Constraint`.
Callers load the blob once (cheap — mmap-backed
`anybytes::Bytes`) and produce a constraint by binding the
variables they want:
```rust
let bm25: SuccinctBM25Index = reader.get(bm25_handle)?;
let hnsw: SuccinctHNSWIndex = reader.get(hnsw_handle)?;
let hnsw_view = hnsw.attach(&reader);
let rows: Vec<(Id,)> = find!(
(doc: Id),
and!(
pattern!(&kb, [{ ?doc @ wiki::content: _ }]),
bm25.matches(doc, &terms, 0.0),
),
)
.collect();
```
BM25 binds `doc` only — `matches(doc, &terms, score_floor)` is
a single-variable filter; ranking happens in Rust via
`idx.score(&doc, terms)` after `.collect()`. HNSW similarity
is a binary `similar(a, b, score_floor)` relation over
`Inline<Handle<Embedding>>` variables (see
`docs/QUERY_ENGINE_INTEGRATION.md`). Ordering is operational —
callers collect the iterator and slice.
## What lives where
| `Inline`, `Id`, `TribleSet` | triblespace |
| Blob byte buffers (mmap) | anybytes |
| Succinct primitives | jerky |
| BlobEncoding + constraints | **triblespace-search** |
| Tokenizers (opt-in helpers) | **triblespace-search** |
| Caller-supplied embeddings | downstream |
`triblespace-search` does not depend on any embedding library.
Callers bring their own embeddings (local MiniLM via fastembed,
API-based Voyage/OpenAI, or anything that produces `f32` vectors
of a fixed dimensionality) and insert them into the pile under
an `Embedding<const D: usize>` schema they control.
## Non-goals (v1)
- Mutable updates. Rebuild is the only update path.
- Distributed/sharded indexes. Single-node first; sharding lives
above the index API if/when it matters.
- Language-aware tokenization. `hash_tokens` is intentionally
minimal; callers with real NLP needs tokenize themselves.
- Linear score combinations across BM25 + HNSW (hybrid search).
Caller composes the boolean combination through `and!` /
`or!` in `find!` (see `examples/hybrid_search.rs`); if they
want to rank on a weighted sum of scores, they do so in Rust
after `.collect()`.
## Worked example: 100 000 wiki fragments
Sizing exercise for the canonical downstream: indexing a Liora
pile of ≈ 100 k typst wiki fragments, average ≈ 180 words each
(≈ 300 raw tokens with punctuation). Numbers are back-of-envelope
for the *naive* (current) layout — the jerky succinct pass will
shrink the term-heavy sections.
### BM25 — size estimate
Assume after `hash_tokens`:
- `n_docs = 100 000`
- `avg_doc_len ≈ 180` unique tokens per doc after trim/dedup
- distinct terms across corpus `n_terms ≈ 300 000` (Heaps' law
with β ≈ 0.5, k ≈ 30 for English-ish text)
- total postings `≈ 100 000 × 180 = 18 000 000` entries
Two columns: a theoretical "naive flat-array" layout (the
pre-jerky baseline this crate started from — reported by
[`BM25Index::byte_size`], no actual serializer ships) and the
landed SB25 format (`SuccinctBM25Index::to_bytes`) with
bit-packing + score quantization.
| header | — | — | 20 B | 264 B |
| keys | 32 B | 100 000 | 3.2 MiB | ~1.5–3.2 MiB|
| doc_lens | 4 B | 100 000 | 0.4 MiB | ~0.12 MiB |
| terms (sorted) | 32 B | 300 000 | 9.6 MiB | 9.6 MiB |
| postings_offsets | 4 B | 300 001 | 1.2 MiB | ~0.6 MiB |
| postings.doc_idx | 4 B | 18 000 000 | 72 MiB | ~38 MiB |
| postings.score | 4 B | 18 000 000 | 72 MiB | 36 MiB |
| **Total** | | | **~159 MiB**| **~86 MiB** |
Every row computed the same way: the bit-packed sections use
`ceil(log2(n + 1))` bits per entry (doc_idx → 17 bits ≈ 2.12 B;
doc_lens at max ≈ 1024 → 10 bits ≈ 1.25 B; offsets at 18M max →
25 bits ≈ 3.1 B), and u16-quantized scores drop from 4 B to 2 B.
The `keys` range covers the fragment-dictionary compression
spread: near-worst-case (random 32-byte values, no shared 4-byte
fragments) ≈ raw 3.2 MiB plus a small DACs overhead; typical
GenId-keyed corpora with 16 bytes of zero padding and structured
trible bytes compress toward ~1.5 MiB. Neither end moves the
blob total noticeably — keys are ~2 % of the 86 MiB.
The **postings dominate** at 85 %+ of either blob. SB25's bit-
packed `doc_idx` plus u16 scores halves that section — the rest
of the footprint (keys, terms, docs_lens) is already as small as
the data allows without additional structure.
Term table is the second-largest chunk (9.6 MiB of 32-byte
Blake3 hashes). Phase 2a tried fragment-dictionary compression
here and it made the section **bigger** — maximum-entropy hashes
have no shared 4-byte fragments for the dictionary to exploit.
Left uncompressed.
### BM25 — build time
Build is O(total postings) with hashmap bookkeeping: `18 M`
insertions into the `HashMap<RawInline, HashMap<u32, u32>>` tf
table, then a sort over 300 k term hashes (32-byte compare).
On current laptop hardware:
- Hash-tokenize 100 k fragments × 180 tokens ≈ 18 M Blake3 hashes.
Blake3 is ~3 GB/s on short inputs → ~0.5 s.
- Hashmap inserts: ~100 ns each × 18 M ≈ 1.8 s.
- Term sort: 300 k × log₂(300 k) × 32-byte compare ≈ 50 ms.
- Score computation: 18 M FMA-ish float ops ≈ 50 ms.
So **~3 s single-threaded** for the full corpus.
`BM25Builder::build_with_threads(n)` shards docs across `n`
scoped threads (std::thread::scope, no rayon dep) and merges
per-shard tf maps at the end. Observed speedups at 4 threads
on a laptop: ~1.2× at 10 k docs, ~1.3× at 50 k — the merge
cost stays serial and caps the win. Byte-identical output vs.
single-threaded. A term-partitioned variant would push further
but needs a routing hash per insert; filed as future work when
build-time actually bites.
### BM25 — query latency
`cargo run --release --example query_latency` on current laptop
hardware (10 k / 50 k docs with Zipf-ish vocab):
| 10 k × 64 tokens | naive | 125 ns | 162 ns | 459 ns |
| 10 k × 64 tokens | SB25 | 875 ns | 1.06 µs | 4.04 µs |
| 50 k × 96 tokens | naive | 292 ns | 417 ns | 2.29 µs |
| 50 k × 96 tokens | SB25 | 2.00 µs | 2.82 µs | 6.21 µs |
Single-term queries stay comfortably under 3 µs on the succinct
path even at 50 k docs — the original design estimate was a
generous upper bound. The naive path is ~6-7× faster than SB25
(flat `Vec` read + pre-baked `f32` vs. bit-unpacking a
`CompactVector` + dequantizing a u16 score); SB25 trades that
latency for ~2× smaller blobs on disk.
3-term `query_multi` (OR of independent posting lists):
~207 µs p50 at 50 k docs, dominated by the aggregation
`HashMap<Id, f32>`. For latency-critical multi-term queries an
`and!` via the engine is cheaper — merge-join size is
`min(|postings|)`, not `sum(|postings|)`.
### HNSW — size estimate
At `n = 100 000`, `dim = 384` (MiniLM), `M = 16`, M0 = 32.
Embeddings are not in the HNSW blob — they live in the pile's
blob store, content-addressed, and shared across every index
that references them. The HNSW blob only carries the handles
table and the graph:
- `handles`: 100 k × 32 B = **3.2 MiB** (one
`Inline<Handle<Embedding>>` per node, the sole
per-node table)
- graph `neighbours`: ~1 M directed edges (average `M`
neighbours per node plus layer-0 fill-in with `M0 = 32`),
each packed at `ceil(log2(n + 1)) = 17` bits ≈ 2.1 MiB
- graph `offsets`: `(layers + 1) × n` entries at
`ceil(log2(edges + 1)) = 20` bits; layers stay ~4–5 by
design (`log_M(100k) ≈ 4`), so roughly 500 k entries ≈
1.25 MiB
- SH25 header: 128 B (negligible)
- **Total HNSW blob ~6.5 MiB.**
Separately, in the pile's blob store:
- `Embedding` blobs: 100 k × 384 × 4 B = **147 MiB** (caller-
owned, dedup'd — two indexes over the same vectors share
the bytes)
The handle-indirected design moves embedding compression out
of this crate's surface: `Embedding` is agnostic to the on-
disk encoding, so callers who care about footprint swap in
`EmbeddingI8` / `EmbeddingPQ` schemas at their level. The
crate's own succinct pass targets the graph — which at this
scale is already ~4 % of the total corpus footprint, so
there's no transformative graph win to chase.
### HNSW — query latency
`query_latency` example on 5 k / 10 k × 32-dim corpora, probes
sampled from the indexed handles, threshold `cos ≥ 0.5`,
`ef_search = 50`:
| 5 k × 32 | naive | 190 µs | 191 µs | 215 µs |
| 5 k × 32 | SH25 | 190 µs | 191 µs | 222 µs |
| 10 k × 32 | naive | 226 µs | 228 µs | 273 µs |
| 10 k × 32 | SH25 | 220 µs | 221 µs | 248 µs |
SH25 tracks naive within noise — both paths fetch every
visited embedding through the same `BlobCache<MemoryBlobStore,
Embedding>` and compute cosine against a
contiguous `&[f32]` view, so the graph-access difference
(pointer hop vs. bit-unpack) is swamped by the O(ef_search ×
dim) distance-eval work. Threshold walks visit more nodes
than the old top-k shape did (no early exit once `k` hits),
which accounts for the absolute-number jump versus earlier
measurements.
### Takeaways
- Naive BM25 blob is ~1.5 KiB per doc — already shippable as a
scaffold; SB25 halves that.
- Postings are the biggest lever; bit-packing + u16 scores
already claimed it. The next step (delta-encoded `doc_idx`,
wavelet-matrix term table) is incremental, not transformative.
- For HNSW, the interesting compression sits in *caller-owned*
embedding bytes — this crate's pass is about graph compactness
and graph-walk speed, not bulk size.
- At these scales a single-node mmap-backed blob is fine; the
"distributed indexes" non-goal holds even at 1 M docs.