Module reranker

Expand description

Cross-encoder reranking for search results.

A cross-encoder takes a (query, document) pair and produces a relevance score. This is more accurate than cosine similarity of independent embeddings but slower since it must run for each candidate.

Two implementations:

CrossEncoder::Lexical — lightweight term-overlap scorer (default).
CrossEncoder::Neural — BERT-based cross-encoder loaded via candle from cross-encoder/ms-marco-MiniLM-L-6-v2 (~80 MB, ONNX-free).

Structs§

BatchedReranker: Concurrent rerank coalescer.
ReflectionBoostConfig: v0.7.0 L2-8 — configuration for the reflection-aware reranker boost.
SessionRecallTracker: v0.7.0 (issue #518) — process-global tracker mapping session_id to its FIFO ring buffer of recently-accessed memory ids.

Enums§

CrossEncoder: Cross-encoder for (query, document) relevance scoring.
RerankerScoreFloor: v0.7.0 #1319 — post-blend score floor applied by BatchedReranker.

Constants§

BATCHED_RERANK_MIN_CONCURRENCY: #1579 B10 — minimum number of in-flight rerank requests (including the current one) before BatchedReranker::rerank routes through the coalescing worker on a neural encoder. Below this threshold there is nothing to coalesce WITH: the lone caller pays the worker channel round-trip plus up to DEFAULT_MAX_WAIT_MS of flush-window wait for zero amortisation gain.
CROSS_ENCODER_MAX_SEQ: Model-architecture ceiling on the cross-encoder input sequence. Per-consumer truncation (e.g. the #1604 rerank cap below) may go tighter, never looser — the resolver clamps against this value.
DEFAULT_MAX_BATCH: Default upper bound on how many requests we coalesce per BERT call.
DEFAULT_MAX_WAIT_MS: Default flush latency (ms) — how long the worker waits for more requests before processing a non-full batch. 5ms keeps single-request latency negligible while still benefiting parallel callers.
DEFAULT_REFLECTION_BOOST: v0.7.0 L2-8 — default multiplicative boost applied to Reflection-kind memories AFTER cross-encoder reranking. Reflections summarise multiple observations, so abstraction-shaped queries (“what patterns…”, “what are recurring themes…”) should preferentially surface them. Default value 1.2 sits in the band where a reflection with a base score equal to its source observations consistently lifts into the top-5 without dragging mediocre reflections above well-matched observations.
DEFAULT_REFLECTION_MAX_DEPTH_CAP: v0.7.0 L2-8 — default depth cap mirrored from [GovernancePolicy::effective_max_reflection_depth]. Past this depth the per-depth multiplier stops growing; reflections deeper than the cap still receive the cap-evaluated boost (operator policy may refuse the write entirely, but the reranker side never produces an unbounded multiplier).
DEFAULT_REFLECTION_PER_DEPTH_INCREMENT: v0.7.0 L2-8 — default per-depth additional multiplier increment. per_depth_factor = 1.0 + per_depth_increment * reflection_depth. Deeper reflections (reflections-on-reflections) compress more observations, so a small per-depth bump is justified.
RERANK_MAX_SEQ_DEFAULT: #1604 — compiled default for the tokenized length of rerank inputs, applied in CrossEncoder::neural_score_pairs (the #1597 batched-forward path) instead of the architecture-ceiling CROSS_ENCODER_MAX_SEQ.
RERANK_POOL_MAX: #1597 — hard cap on how many candidates receive a cross-encoder score per rerank call.
SESSION_RECENCY_BOOST: Additive boost applied to a recall candidate that appears in the session’s recently-accessed set. Sits at +0.05 — small enough that a low-relevance candidate cannot leapfrog a substantially-better match, large enough to break ties in favour of memories the agent just touched in the same session.
SESSION_RECENT_CAP: Per-session cap on the recently-accessed ring buffer. When the buffer is at the cap, the oldest entry is evicted (FIFO) before the newest entry is appended. Keeps the substrate memory cost bounded at O(SESSIONS * 50) ids regardless of recall traffic.

Functions§

apply_session_recency_boost: v0.7.0 (issue #518) — apply the per-session recently-accessed boost to a scored recall result vector AND record the post-boost hit set back into the session’s ring buffer.
global_session_recall_tracker: Process-global SessionRecallTracker used by every recall hot path. Lazily initialised on first access; never reset within a process lifetime (per-process state by design — operator restart clears every session’s recent set).
set_rerank_max_seq: Seed the process-wide rerank sequence cap for every subsequent batched rerank forward. Idempotent — first writer wins; later calls are no-ops (matches crate::storage::set_db_mmap_size).
use_batched_rerank_path: #1579 B10 — the auto-select predicate, extracted as a free function so the threshold arithmetic is unit-testable without standing up a worker thread or downloading model weights. true ⇒ route through the coalescing worker; false ⇒ direct encoder call.

Module reranker

Module reranker Copy item path

Structs§

Enums§

Constants§

Functions§

Module reranker