Skip to main content

dbsf

Function dbsf 

Source
pub fn dbsf<I: Clone + Eq + Hash>(
    results_a: &[(I, f32)],
    results_b: &[(I, f32)],
) -> Vec<(I, f32)>
Expand description

Distribution-Based Score Fusion (DBSF).

DBSF uses z-score normalization (standardization) with mean ± 3σ clipping, then sums the normalized scores. More robust than min-max normalization when score distributions differ significantly or contain outliers.

§Algorithm

For each list:

  1. Compute mean (μ) and standard deviation (σ)
  2. Normalize: z = (score - μ) / σ, clipped to [-3, 3]
  3. Sum normalized z-scores across lists

The ±3σ clipping prevents extreme outliers from dominating the fusion.

§Arguments

  • results_a - First ranked list with scores
  • results_b - Second ranked list with scores

§Returns

Fused ranking sorted by combined z-score (descending). Documents with consistently high z-scores across lists rank highest.

§Example

use rankops::dbsf;

// BM25 scores (high variance, different scale)
let bm25 = vec![("d1", 15.0), ("d2", 12.0), ("d3", 8.0)];

// Dense embedding scores (low variance, different scale)
let dense = vec![("d2", 0.9), ("d3", 0.7), ("d4", 0.5)];

let fused = dbsf(&bm25, &dense);
// Z-scores normalize both lists to comparable scales
// d2 and d3 appear in both lists, so they rank highest

§Performance

Time complexity: O(n log n) where n = total items across all lists. Requires computing mean and std for each list (O(n) per list). For typical workloads (100-1000 items per list), fusion completes in <1ms.

§When to Use

  • ✅ Score distributions differ significantly (BM25: 0-100, embeddings: 0-1)
  • ✅ Outliers are present (z-score clipping handles them)
  • ✅ Need robust normalization (more robust than min-max)

§When NOT to Use

  • ❌ Score scales are similar (use combsum for simplicity)
  • ❌ Need configurable clipping (use standardized with custom range)
  • ❌ Unknown score scales (use RRF to avoid normalization)

§Trade-offs vs CombSUM

  • Robustness: More robust to outliers (z-score vs min-max)
  • Complexity: Slightly more complex (requires mean/std computation)
  • Clipping: Fixed [-3, 3] range (use standardized for custom range)

§Differences from Standardized

  • DBSF uses fixed [-3, 3] clipping
  • Standardized allows configurable clipping range
  • Both use the same z-score approach