rankops
Operations on ranked lists: fuse multiple retrievers, then rerank. Pairs with rankfns (scoring kernels).
rankops covers the post-retrieval pipeline:
- Fusion -- combine ranked lists from heterogeneous retrievers (BM25, dense, sparse)
- Reranking -- MaxSim/ColBERT late interaction, MMR/DPP diversity, Matryoshka two-stage
- Evaluation -- NDCG, MAP, MRR, Precision@k, recall@k, Hit Rate, fusion parameter optimization
- Diagnostics -- complementarity analysis, score distribution stats, fusion recommendations
Quickstart
[]
= "0.1.4"
Fusion
Fuse two ranked lists with Reciprocal Rank Fusion (score-agnostic, works across incompatible scales):
use rrf;
let bm25 = vec!;
let dense = vec!;
let fused = rrf;
// doc_b ranks highest: appears in both lists
assert_eq!;
Score-based fusion when scales are comparable:
use combmnz;
let fused = combmnz;
// CombMNZ: sum of normalized scores * overlap count
Select the algorithm at runtime via FusionMethod:
use FusionMethod;
let method = Rrf ;
let result = method.fuse;
Diversity reranking (requires rerank feature, on by default)
use ;
let candidates = vec!;
let similarity = vec!;
let config = default.with_lambda.with_k;
let selected = mmr;
// Picks d1 (highest relevance), then d3 (diverse from d1)
Fusion algorithms
| Function | Uses scores | Description |
|---|---|---|
rrf |
No | Reciprocal Rank Fusion -- rank-based, works across incompatible scales |
isr |
No | Inverse Square Root fusion -- gentler rank decay than RRF |
borda |
No | Borda count -- (N - rank) voting points |
condorcet |
No | Pairwise Condorcet voting -- outlier-robust |
copeland |
No | Copeland voting -- net pairwise wins, more discriminative than Condorcet |
median_rank |
No | Median rank across lists -- outlier-robust aggregation |
combsum |
Yes | Sum of min-max normalized scores |
combmnz |
Yes | CombSUM * overlap count -- rewards multi-list presence |
combmax |
Yes | Max score across lists |
combmin |
Yes | Min score -- conservative, requires all retrievers to agree |
combmed |
Yes | Median score -- robust to outliers |
combanz |
Yes | Average of non-zero scores |
weighted |
Yes | Weighted combination with per-list weights |
dbsf |
Yes | Distribution-Based Score Fusion (z-score normalization) |
standardized |
Yes | ERANK-style z-score fusion with clipping |
All two-list functions have *_multi variants for 3+ lists. Explainability variants (rrf_explain, combsum_explain, etc.) return full provenance.
Normalization
Score normalization for cross-retriever fusion via Normalization enum:
| Variant | Range | Notes |
|---|---|---|
MinMax |
[0, 1] | Default. Sensitive to outliers |
ZScore |
~[-3, 3] | Robust to different distributions |
Quantile |
[0, 1] | Percentile ranks. Most robust to non-Gaussian scores |
Sigmoid |
(0, 1) | Logistic squash. Handles unbounded scores (cross-encoder logits) |
Sum |
[0, 1] | Relative magnitudes. For probability-like scores |
Rank |
[0, 1] | Ignores magnitudes entirely |
Evaluation
| Function | Description |
|---|---|
ndcg_at_k |
Normalized Discounted Cumulative Gain (BEIR default) |
map / map_at_k |
Mean Average Precision (MTEB Reranking default) |
mrr |
Mean Reciprocal Rank |
precision_at_k |
Fraction of top-k that are relevant |
recall_at_k |
Fraction of relevant docs in top-k |
hit_rate |
Binary: any relevant doc in top-k? |
evaluate_metric |
Dispatch by OptimizeMetric enum |
optimize_fusion |
Grid search over fusion parameters |
Diagnostics
The diagnostics module helps decide whether fusion is beneficial:
| Function | Description |
|---|---|
score_stats |
Distribution analysis (mean, std, median, percentiles) |
overlap_ratio |
Jaccard overlap between document sets |
complementarity |
Fraction of relevant docs unique to one retriever |
rank_correlation |
Kendall's tau-b on shared documents |
diagnose |
Full report with fusion recommendation |
Based on Louis et al., "Know When to Fuse" (2024): high complementarity (>0.5) predicts fusion benefit; low correlation between rankers predicts fusion benefit.
Adapters
The adapt module converts retriever outputs to rankops format:
| Function | Input | Conversion |
|---|---|---|
from_distances |
L2/Euclidean (lower=closer) | 1/(1+d) to (0, 1] |
from_similarities |
Cosine sim (higher=better) | Sort descending |
from_logits |
Cross-encoder logits (unbounded) | Sigmoid to (0, 1) |
from_inner_product |
Dot product (higher=better) | Sort descending |
All have _mapped variants for ID type conversion (e.g., u32 doc index to &str doc name).
Pipeline
The pipeline module provides composable post-retrieval operations:
use Pipeline;
use ;
let result = new
.add_run
.add_run
.normalize
.fuse
.top_k
.execute;
Also: compare() for method comparison, fuse_multi_query() for the N-queries x M-retrievers RAG pattern.
Reranking (feature: rerank)
| Module | Description |
|---|---|
rerank::colbert |
MaxSim late interaction scoring (ColBERT, ColPali, Jina-ColBERT) |
rerank::diversity |
MMR and DPP diversity selection |
rerank::matryoshka |
Two-stage reranking with nested (Matryoshka) embeddings |
rerank::embedding |
Normalized vectors, masked token MaxSim |
rerank::quantization |
int8 quantization/dequantization for token embeddings |
Features
| Feature | Default | Description |
|---|---|---|
rerank |
Yes | MaxSim, diversity, Matryoshka reranking (depends on innr for SIMD) |
hierarchical |
No | Hierarchical ColBERT clustering (depends on kodama) |
serde |
No | Serialization for configs and types |
Examples
See also
- rankfns -- scoring kernels (BM25, TF-IDF, cosine) that pair with
rankops - innr -- SIMD dot product and MaxSim primitives used by the
rerankfeature - vicinity -- ANN vector search (HNSW, IVF-PQ) that feeds ranked candidates to
rankops - rankit -- learning-to-rank training (LTR losses, differentiable sorting, evaluation)
License
MIT OR Apache-2.0