Expand description
Cross-encoder reranking for search results.
A cross-encoder takes a (query, document) pair and produces a relevance score. This is more accurate than cosine similarity of independent embeddings but slower since it must run for each candidate.
Two implementations:
CrossEncoder::Lexical— lightweight term-overlap scorer (default).CrossEncoder::Neural— BERT-based cross-encoder loaded via candle fromcross-encoder/ms-marco-MiniLM-L-6-v2(~80 MB, ONNX-free).
Structs§
- Batched
Reranker - Concurrent rerank coalescer.
- Reflection
Boost Config - v0.7.0 L2-8 — configuration for the reflection-aware reranker boost.
- Session
Recall Tracker - v0.7.0 (issue #518) — process-global tracker mapping
session_idto its FIFO ring buffer of recently-accessed memory ids.
Enums§
- Cross
Encoder - Cross-encoder for (query, document) relevance scoring.
- Reranker
Score Floor - v0.7.0 #1319 — post-blend score floor applied by
BatchedReranker.
Constants§
- BATCHED_
RERANK_ MIN_ CONCURRENCY - #1579 B10 — minimum number of in-flight rerank requests (including
the current one) before
BatchedReranker::rerankroutes through the coalescing worker on a neural encoder. Below this threshold there is nothing to coalesce WITH: the lone caller pays the worker channel round-trip plus up toDEFAULT_MAX_WAIT_MSof flush-window wait for zero amortisation gain. - CROSS_
ENCODER_ MAX_ SEQ - Model-architecture ceiling on the cross-encoder input sequence. Per-consumer truncation (e.g. the #1604 rerank cap below) may go tighter, never looser — the resolver clamps against this value.
- DEFAULT_
MAX_ BATCH - Default upper bound on how many requests we coalesce per BERT call.
- DEFAULT_
MAX_ WAIT_ MS - Default flush latency (ms) — how long the worker waits for more requests before processing a non-full batch. 5ms keeps single-request latency negligible while still benefiting parallel callers.
- DEFAULT_
REFLECTION_ BOOST - v0.7.0 L2-8 — default multiplicative boost applied to
Reflection-kind memories AFTER cross-encoder reranking. Reflections summarise multiple observations, so abstraction-shaped queries (“what patterns…”, “what are recurring themes…”) should preferentially surface them. Default value1.2sits in the band where a reflection with a base score equal to its source observations consistently lifts into the top-5 without dragging mediocre reflections above well-matched observations. - DEFAULT_
REFLECTION_ MAX_ DEPTH_ CAP - v0.7.0 L2-8 — default depth cap mirrored from
[
GovernancePolicy::effective_max_reflection_depth]. Past this depth the per-depth multiplier stops growing; reflections deeper than the cap still receive the cap-evaluated boost (operator policy may refuse the write entirely, but the reranker side never produces an unbounded multiplier). - DEFAULT_
REFLECTION_ PER_ DEPTH_ INCREMENT - v0.7.0 L2-8 — default per-depth additional multiplier increment.
per_depth_factor = 1.0 + per_depth_increment * reflection_depth. Deeper reflections (reflections-on-reflections) compress more observations, so a small per-depth bump is justified. - RERANK_
MAX_ SEQ_ DEFAULT - #1604 — compiled default for the tokenized length of rerank
inputs, applied in
CrossEncoder::neural_score_pairs(the #1597 batched-forward path) instead of the architecture-ceilingCROSS_ENCODER_MAX_SEQ. - RERANK_
POOL_ MAX - #1597 — hard cap on how many candidates receive a cross-encoder score per rerank call.
- SESSION_
RECENCY_ BOOST - Additive boost applied to a recall candidate that appears in the session’s recently-accessed set. Sits at +0.05 — small enough that a low-relevance candidate cannot leapfrog a substantially-better match, large enough to break ties in favour of memories the agent just touched in the same session.
- SESSION_
RECENT_ CAP - Per-session cap on the recently-accessed ring buffer. When the
buffer is at the cap, the oldest entry is evicted (FIFO) before the
newest entry is appended. Keeps the substrate memory cost bounded
at
O(SESSIONS * 50)ids regardless of recall traffic.
Functions§
- apply_
session_ recency_ boost - v0.7.0 (issue #518) — apply the per-session recently-accessed boost to a scored recall result vector AND record the post-boost hit set back into the session’s ring buffer.
- global_
session_ recall_ tracker - Process-global
SessionRecallTrackerused by every recall hot path. Lazily initialised on first access; never reset within a process lifetime (per-process state by design — operator restart clears every session’s recent set). - set_
rerank_ max_ seq - Seed the process-wide rerank sequence cap for every subsequent
batched rerank forward. Idempotent — first writer wins; later calls
are no-ops (matches
crate::storage::set_db_mmap_size). - use_
batched_ rerank_ path - #1579 B10 — the auto-select predicate, extracted as a free function
so the threshold arithmetic is unit-testable without standing up a
worker thread or downloading model weights.
true⇒ route through the coalescing worker;false⇒ direct encoder call.