Expand description
HELIX-IDEA-006 Phase 1 — Maximal Marginal Relevance reranking.
Contract: contracts/apr-rerank-v1.yaml (ACTIVE).
Pattern source: helix-db helix_engine/reranker/fusion/mmr.rs
(re-implemented; no code lift). Reference:
Carbonell & Goldstein (1998). “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries.” https://www.cs.cmu.edu/~jgc/publication/MMR.pdf
MMR balances relevance against diversity by iteratively selecting the candidate that maximises:
MMR(d) = λ · sim_query(d) − (1 − λ) · max_{s ∈ Selected} sim_pair(d, s)At λ=1, the diversity term vanishes and the output is the
input sorted by relevance descending — verified by
FALSIFY-RERANK-MMR-002.
§Example
use aprender_rag::mmr::mmr_select;
let candidates = vec![
("doc-a", 0.9_f32),
("doc-b", 0.8),
("doc-c", 0.7),
("doc-a-paraphrase", 0.85),
];
// Pretend doc-a and doc-a-paraphrase are highly similar.
let sim = |x: &&str, y: &&str| if x.contains("doc-a") && y.contains("doc-a") { 0.95 } else { 0.05 };
// λ=1 → pure relevance.
let by_rel = mmr_select(candidates.clone(), sim, 1.0, 3);
assert_eq!(by_rel[0].0, "doc-a");
assert_eq!(by_rel[1].0, "doc-a-paraphrase");
// λ=0.5 → diversity penalises near-duplicate of doc-a.
let diverse = mmr_select(candidates, sim, 0.5, 3);
assert_eq!(diverse[0].0, "doc-a");
assert_eq!(diverse[1].0, "doc-b"); // not the paraphraseFunctions§
- mmr_
select - Select up to
top_kitems via Maximal Marginal Relevance.