Skip to main content

Module mmr

Module mmr 

Source
Expand description

HELIX-IDEA-006 Phase 1 — Maximal Marginal Relevance reranking.

Contract: contracts/apr-rerank-v1.yaml (ACTIVE). Pattern source: helix-db helix_engine/reranker/fusion/mmr.rs (re-implemented; no code lift). Reference:

Carbonell & Goldstein (1998). “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries.” https://www.cs.cmu.edu/~jgc/publication/MMR.pdf

MMR balances relevance against diversity by iteratively selecting the candidate that maximises:

  MMR(d) = λ · sim_query(d) − (1 − λ) · max_{s ∈ Selected} sim_pair(d, s)

At λ=1, the diversity term vanishes and the output is the input sorted by relevance descending — verified by FALSIFY-RERANK-MMR-002.

§Example

use aprender_rag::mmr::mmr_select;

let candidates = vec![
    ("doc-a", 0.9_f32),
    ("doc-b", 0.8),
    ("doc-c", 0.7),
    ("doc-a-paraphrase", 0.85),
];
// Pretend doc-a and doc-a-paraphrase are highly similar.
let sim = |x: &&str, y: &&str| if x.contains("doc-a") && y.contains("doc-a") { 0.95 } else { 0.05 };

// λ=1 → pure relevance.
let by_rel = mmr_select(candidates.clone(), sim, 1.0, 3);
assert_eq!(by_rel[0].0, "doc-a");
assert_eq!(by_rel[1].0, "doc-a-paraphrase");

// λ=0.5 → diversity penalises near-duplicate of doc-a.
let diverse = mmr_select(candidates, sim, 0.5, 3);
assert_eq!(diverse[0].0, "doc-a");
assert_eq!(diverse[1].0, "doc-b"); // not the paraphrase

Functions§

mmr_select
Select up to top_k items via Maximal Marginal Relevance.