Skip to main content

combsum

Function combsum 

Source
pub fn combsum<I: Clone + Eq + Hash>(
    results_a: &[(I, f32)],
    results_b: &[(I, f32)],
) -> Vec<(I, f32)>
Expand description

Sum of min-max normalized scores (CombSUM).

CombSUM normalizes each list to [0, 1] using min-max normalization, then sums the normalized scores. This preserves score magnitudes while handling different scales.

§Formula

For each list: normalized = (score - min) / (max - min) Final score: score(d) = Σ normalized_scores(d)

§Arguments

  • results_a - First ranked list: Vec<(document_id, score)>
  • results_b - Second ranked list: Vec<(document_id, score)>

§Returns

Fused ranking sorted by combined score (descending). Documents with higher normalized scores across lists rank higher.

§Example

use rankops::combsum;

// Both lists use cosine similarity (0-1 scale)
let sparse = vec![
    ("doc1", 0.9),
    ("doc2", 0.8),
    ("doc3", 0.7),
];

let dense = vec![
    ("doc2", 0.95),
    ("doc1", 0.85),
    ("doc3", 0.75),
];

let fused = combsum(&sparse, &dense);
// doc2 ranks highest (0.8 + 0.95 = 1.75 after normalization)

§Performance

Time complexity: O(n log n) where n = total items across all lists. For typical workloads (100-1000 items per list), fusion completes in <1ms.

§When to Use

  • ✅ Scores are on similar scales (e.g., all cosine similarities 0-1)
  • ✅ You trust score magnitudes (scores represent true relevance)
  • ✅ Need better accuracy than RRF (CombSUM typically 3-4% higher NDCG)

§When NOT to Use

  • ❌ Incompatible score scales (BM25: 0-100 vs embeddings: 0-1) - use RRF
  • ❌ Score distributions differ significantly - use standardized or dbsf
  • ❌ Unknown score scales - use RRF

§Trade-offs

  • Accuracy: Typically 3-4% higher NDCG than RRF (OpenSearch benchmarks)
  • Robustness: Less robust to outliers than RRF (min-max is sensitive)
  • Speed: Similar to RRF (~1-2% faster due to simpler computation)