Expand description
Evaluation metrics and infrastructure for ANN benchmarks.
Standard metrics used in ANN literature:
| Metric | Formula | Interpretation |
|---|---|---|
| Recall@K | |approx ∩ true| / K | Fraction of true neighbors found |
| Precision@K | |approx ∩ true| / |approx| | Same as recall when |approx| = K |
| MRR | 1 / rank_of_first_true | Reciprocal of first relevant result |
| QPS | queries / seconds | Throughput |
§Standard Benchmark Datasets
| Dataset | Size | Dim | Distance | Source |
|---|---|---|---|---|
| SIFT-1M | 1M | 128 | L2 | INRIA Texmex |
| GIST-1M | 1M | 960 | L2 | INRIA Texmex |
| GloVe-100 | 1.2M | 100 | Angular | Stanford NLP |
| Fashion-MNIST | 60K | 784 | L2 | Zalando |
| Deep1B | 1B | 96 | Angular | Yandex |
Reference: https://ann-benchmarks.com/
Re-exports§
pub use crate::distance::angular_distance;pub use crate::distance::cosine_distance;pub use crate::distance::inner_product_distance;pub use crate::distance::l2_distance;pub use crate::distance::normalize;pub use crate::distance::DistanceMetric;
Structs§
- Eval
Dataset - A dataset with ground truth for evaluation.
- Eval
Results - Evaluation results for a single run.
Functions§
- compute_
ground_ truth - Compute ground truth for a dataset.
- evaluate
- Evaluate an algorithm on a dataset.
- generate_
clustered_ dataset - Generate a clustered dataset (more realistic).
- generate_
normalized_ clustered_ dataset - Generate a normalized dataset (for cosine/angular metrics).
- generate_
uniform_ dataset - Generate a synthetic dataset with uniform random vectors.
- mrr
- Compute mean reciprocal rank for a single query.
- recall_
at_ k - Compute recall@k for a single query.