pub enum CrossEncoder {
Lexical {
degraded: bool,
},
Neural {
model: Arc<BertModel>,
tokenizer: Arc<Tokenizer>,
classifier_weight: Tensor,
classifier_bias: Tensor,
device: Device,
},
}Expand description
Cross-encoder for (query, document) relevance scoring.
Variants§
Lexical
Lightweight lexical cross-encoder using term overlap signals.
degraded is true when this variant exists because a
configured neural cross-encoder failed to initialise (HF Hub
unreachable, model checksum mismatch, etc.) and the runtime
fell back. false is the originally-configured lexical tier
(operator opted in to keyword-tier or smart-tier without
cross-encoder reranking).
v0.7.0 R3-S2 — the distinction surfaces in the recall
response’s meta.reranker_used field as
"degraded_lexical" vs "lexical", so an in-band signal
tells clients (MCP + HTTP) when their reranker downgraded.
The original G8 fix landed tracing::warn! only; G8 closure
per the playbook required an in-response field, which the
prior implementation overstated.
Neural
Neural BERT-based cross-encoder (ms-marco-MiniLM-L-6-v2).
v0.7.0 #1084 — model is Arc<BertModel> (no mutex), same
pattern as Embedder::Local. The pre-#1084 design held an
Arc<Mutex<BertModel>> and locked across the full neural
rerank forward pass, serialising every rerank-tier recall on
a single global mutex. Candle’s BertModel::forward takes
&self (inference-only; weights are read-only) so the
mutex was unnecessary.
Implementations§
Source§impl CrossEncoder
impl CrossEncoder
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new lexical cross-encoder (no model download required).
This is the “originally lexical” path — the operator either
chose keyword-/semantic-tier (no cross-encoder reranking) or
explicitly opted into the lexical variant. Use
Self::new_neural to attempt the neural path with
fall-back-to-lexical semantics.
Sourcepub fn new_neural() -> Self
pub fn new_neural() -> Self
Create a neural cross-encoder by downloading ms-marco-MiniLM-L-6-v2.
Falls back to lexical if download or loading fails. The
fallback is marked degraded: true so the recall response
surfaces reranker_used = "degraded_lexical" per R3-S2 — an
in-band signal that v0.7.0 promises but pre-R3 only emitted
as a tracing::warn! (a tracing-event-only fallback is not
the same as a per-response field operators can branch on).
v0.6.3.1 (P3, G8): when the neural path fails (e.g. HF Hub
unreachable, model checksum mismatch), emit a structured tracing
event reranker.fallback so operators see the silent
neural→lexical degrade. The eprintln remains for backward-compat
startup logs.
Sourcepub fn score(&self, query: &str, title: &str, content: &str) -> f32
pub fn score(&self, query: &str, title: &str, content: &str) -> f32
Score a single (query, document) pair.
Returns a relevance score in 0.0..=1.0.
Sourcepub fn is_degraded_lexical(&self) -> bool
pub fn is_degraded_lexical(&self) -> bool
v0.7.0 R3-S2 — whether this cross-encoder is a degraded
lexical fallback (i.e., a neural variant was attempted at
startup or mid-flight and the runtime fell back). false for
Neural and for the originally-configured Lexical (operator
opted into keyword-/semantic-tier without cross-encoder
reranking). The recall response surfaces this distinction as
meta.reranker_used = "degraded_lexical" so clients can
detect the silent downgrade in-band — closing the G8 closure
claim that tracing-event-only signalling had overstated.
Sourcepub fn rerank(
&self,
query: &str,
candidates: Vec<(Memory, f64)>,
) -> Vec<(Memory, f64)>
pub fn rerank( &self, query: &str, candidates: Vec<(Memory, f64)>, ) -> Vec<(Memory, f64)>
Rerank a set of candidates by blending their original scores with cross-encoder scores.
Blend formula: final = 0.6 * original + 0.4 * cross_encoder
#1597 pool cap: only the strongest RERANK_POOL_MAX
candidates by incoming blended score are cross-encoded; the
remainder keep their blended scores and rank below the reranked
head (head sorted by final_score descending, tail sorted by
blended score descending — no candidate is dropped). A pool at
or under the cap is fully reranked and returned sorted by
final_score descending, as before.
v0.7.0 L2-8 contract: the bare rerank is the pre-L2-8
behavior — no reflection boost is applied. Daemons that want
the reflection-aware boost must call
Self::rerank_with_reflection_boost (which is what
BatchedReranker does by default with
ReflectionBoostConfig::default). Keeping the bare method
boost-free is a deliberate regression-pin discipline: the L2-8
recall test for boost = 1.0 uses
rerank_with_reflection_boost(.., &ReflectionBoostConfig::disabled())
and asserts byte-identical output to rerank(..).
Sourcepub fn rerank_with_reflection_boost(
&self,
query: &str,
candidates: Vec<(Memory, f64)>,
boost_config: &ReflectionBoostConfig,
) -> Vec<(Memory, f64)>
pub fn rerank_with_reflection_boost( &self, query: &str, candidates: Vec<(Memory, f64)>, boost_config: &ReflectionBoostConfig, ) -> Vec<(Memory, f64)>
v0.7.0 L2-8 — rerank with a post-step reflection-aware boost.
- Same blend as
Self::rerank(0.6 * original + 0.4 * ce). - After the blend, multiply each candidate’s
final_scorebyReflectionBoostConfig::factor_for. Observations get a multiplier of1.0(unchanged); reflections getboost * (1.0 + per_depth_increment * clamp(depth, 0..=cap)). - Sort descending after the boost so the output ordering reflects the post-boost ranking.
Operationally this means: a reflection that the cross-encoder
scored at parity with its source observations moves up; the
movement is bounded (capped per-depth multiplier, single global
boost factor) so a mediocre reflection cannot leapfrog a
well-matched observation — the boost is a thumb-on-the-scale,
not a free pass.
#1597 pool cap + batched forward pass. Only the strongest
RERANK_POOL_MAX candidates by incoming blended score receive a
cross-encoder score (in one batched forward pass on the Neural
variant); the remainder keep their blended scores, internally
sorted descending, appended after the reranked head. No candidate
is ever dropped. A pool at or under the cap degenerates to the
historical full rerank.
Sourcepub fn rerank_batch(
&self,
queries: Vec<(String, Vec<(Memory, f64)>)>,
) -> Vec<Vec<(Memory, f64)>>
pub fn rerank_batch( &self, queries: Vec<(String, Vec<(Memory, f64)>)>, ) -> Vec<Vec<(Memory, f64)>>
v0.7 G9 — batched rerank for concurrent recall.
Process all (query, candidates) jobs in a single tokenize + single
forward pass on the Neural variant, holding the BERT mutex once for
the whole batch instead of once per (query, candidate) pair.
Throughput target: ~3× for parallel recall vs. per-query
rerank() calls.
Output ordering: result[i] corresponds to queries[i]. Each
inner vector is sorted by descending blended score, identical to
rerank(). Lexical variant delegates per-query (no batching win
since lexical scoring is already CPU-trivial).
Sourcepub fn rerank_batch_with_reflection_boost(
&self,
queries: Vec<(String, Vec<(Memory, f64)>)>,
boost_config: &ReflectionBoostConfig,
) -> Vec<Vec<(Memory, f64)>>
pub fn rerank_batch_with_reflection_boost( &self, queries: Vec<(String, Vec<(Memory, f64)>)>, boost_config: &ReflectionBoostConfig, ) -> Vec<Vec<(Memory, f64)>>
v0.7.0 L2-8 — batched rerank with a post-step reflection-aware
boost applied per candidate. Same boost arithmetic as
Self::rerank_with_reflection_boost, factored so the boost
shape lives in a single helper.
Trait Implementations§
Auto Trait Implementations§
impl !RefUnwindSafe for CrossEncoder
impl !UnwindSafe for CrossEncoder
impl Freeze for CrossEncoder
impl Send for CrossEncoder
impl Sync for CrossEncoder
impl Unpin for CrossEncoder
impl UnsafeUnpin for CrossEncoder
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more