Skip to main content

rrf

Function rrf 

Source
pub fn rrf<I: Clone + Eq + Hash>(
    results_a: &[(I, f32)],
    results_b: &[(I, f32)],
) -> Vec<(I, f32)>
Expand description

Reciprocal Rank Fusion of two result lists with default config (k=60).

Formula: score(d) = Σ 1/(k + rank) where rank is 0-indexed.

Why RRF? Different retrievers use incompatible score scales (BM25: 0-100, dense: 0-1). RRF solves this by ignoring scores entirely and using only rank positions. The reciprocal formula ensures:

  • Top positions dominate (rank 0 gets 1/60 = 0.017, rank 5 gets 1/65 = 0.015)
  • Multiple list agreement is rewarded (documents appearing in both lists score higher)
  • No normalization needed (works with any score distribution)

When to use: Hybrid search with incompatible score scales, zero-configuration needs. When NOT to use: When score scales are compatible, CombSUM achieves ~3-4% better NDCG.

Use rrf_with_config to customize the k parameter (lower k = more top-heavy).

§Duplicate Document IDs

If a document ID appears multiple times in the same list, all occurrences contribute to the RRF score based on their respective ranks. For example, if “d1” appears at rank 0 and rank 5 in list A, its contribution from list A is 1/(k+0) + 1/(k+5). This differs from some implementations that take only the first occurrence.

§Complexity

O(n log n) where n = |a| + |b| (dominated by final sort).

§Input Validation

This function does not validate inputs. For validation, use validate() after fusion. Edge cases handled:

  • Empty lists: Returns items from non-empty list(s)
  • k=0: Returns empty Vec (use validate() to catch this)
  • Non-finite scores: Ignored (RRF is rank-based)

§Example

use rankops::rrf;

let sparse = vec![("d1", 0.9), ("d2", 0.5)];
let dense = vec![("d2", 0.8), ("d3", 0.3)];

let fused = rrf(&sparse, &dense);
assert_eq!(fused[0].0, "d2"); // appears in both lists (consensus)

Reciprocal Rank Fusion (RRF) with default configuration (k=60).

RRF is the recommended fusion method when combining rankings with incompatible score scales. It uses only rank positions, ignoring score magnitudes entirely.

§Formula

RRF(d) = Σ 1/(k + rank_r(d)) where:

  • k = smoothing constant (default: 60)
  • rank_r(d) = position of document d in ranking r (0-indexed)

§Why RRF?

  • Score Scale Independent: Works with any scoring system (BM25: 0-100, embeddings: 0-1)
  • Robust: Handles missing documents gracefully (documents not in a list contribute 0)
  • Effective: Proven to outperform individual rankers in hybrid search
  • Fast: O(n log n) complexity where n = total unique documents

§Arguments

  • results_a - First ranked list: Vec<(document_id, score)>
  • results_b - Second ranked list: Vec<(document_id, score)>

Note: Scores are ignored; only rank positions matter.

§Returns

Fused ranking sorted by RRF score (descending). Documents appearing in both lists rank higher than those appearing in only one list.

§Example

use rankops::rrf;

// BM25 results (high scores = better)
let bm25 = vec![
    ("doc1", 12.5),
    ("doc2", 11.0),
    ("doc3", 10.0),
];

// Dense embedding results (different scale: 0-1)
let dense = vec![
    ("doc2", 0.9),
    ("doc3", 0.8),
    ("doc1", 0.7),
];

// RRF ignores scores, uses only rank positions
let fused = rrf(&bm25, &dense);
// doc2 ranks highest (appears in both lists at high positions)
// doc1 and doc3 follow (appear in both lists but at different positions)

§Performance

Time complexity: O(n log n) where n = |results_a| + |results_b| (dominated by final sort). For typical workloads (100-1000 items per list), fusion completes in <1ms.

§When to Use

  • ✅ Combining BM25 + dense embeddings (different scales)
  • ✅ Combining sparse + dense retrieval
  • ✅ Unknown or incompatible score scales
  • ✅ Need robust fusion without normalization tuning

§When NOT to Use

  • ❌ Scores are already normalized and comparable (use combsum or weighted)
  • ❌ You trust score magnitudes (use score-based fusion)
  • ❌ Need fine-grained control over retriever importance (use rrf_weighted)