pub fn dbsf<I: Clone + Eq + Hash>(
results_a: &[(I, f32)],
results_b: &[(I, f32)],
) -> Vec<(I, f32)>Expand description
Distribution-Based Score Fusion (DBSF).
DBSF uses z-score normalization (standardization) with mean ± 3σ clipping, then sums the normalized scores. More robust than min-max normalization when score distributions differ significantly or contain outliers.
§Algorithm
For each list:
- Compute mean (μ) and standard deviation (σ)
- Normalize:
z = (score - μ) / σ, clipped to [-3, 3] - Sum normalized z-scores across lists
The ±3σ clipping prevents extreme outliers from dominating the fusion.
§Arguments
results_a- First ranked list with scoresresults_b- Second ranked list with scores
§Returns
Fused ranking sorted by combined z-score (descending). Documents with consistently high z-scores across lists rank highest.
§Example
use rankops::dbsf;
// BM25 scores (high variance, different scale)
let bm25 = vec![("d1", 15.0), ("d2", 12.0), ("d3", 8.0)];
// Dense embedding scores (low variance, different scale)
let dense = vec![("d2", 0.9), ("d3", 0.7), ("d4", 0.5)];
let fused = dbsf(&bm25, &dense);
// Z-scores normalize both lists to comparable scales
// d2 and d3 appear in both lists, so they rank highest§Performance
Time complexity: O(n log n) where n = total items across all lists. Requires computing mean and std for each list (O(n) per list). For typical workloads (100-1000 items per list), fusion completes in <1ms.
§When to Use
- ✅ Score distributions differ significantly (BM25: 0-100, embeddings: 0-1)
- ✅ Outliers are present (z-score clipping handles them)
- ✅ Need robust normalization (more robust than min-max)
§When NOT to Use
- ❌ Score scales are similar (use
combsumfor simplicity) - ❌ Need configurable clipping (use
standardizedwith custom range) - ❌ Unknown score scales (use RRF to avoid normalization)
§Trade-offs vs CombSUM
- Robustness: More robust to outliers (z-score vs min-max)
- Complexity: Slightly more complex (requires mean/std computation)
- Clipping: Fixed [-3, 3] range (use
standardizedfor custom range)
§Differences from Standardized
- DBSF uses fixed [-3, 3] clipping
- Standardized allows configurable clipping range
- Both use the same z-score approach