Module evaluation

Expand description

Evaluation metrics and infrastructure for ANN benchmarks.

Standard metrics used in ANN literature:

Metric	Formula	Interpretation
Recall@K	\|approx ∩ true\| / K	Fraction of true neighbors found
Precision@K	\|approx ∩ true\| / \|approx\|	Same as recall when \|approx\| = K
MRR	1 / rank_of_first_true	Reciprocal of first relevant result
QPS	queries / seconds	Throughput

§Standard Benchmark Datasets

Dataset	Size	Dim	Distance	Source
SIFT-1M	1M	128	L2	INRIA Texmex
GIST-1M	1M	960	L2	INRIA Texmex
GloVe-100	1.2M	100	Angular	Stanford NLP
Fashion-MNIST	60K	784	L2	Zalando
Deep1B	1B	96	Angular	Yandex

compute_ground_truth: Compute ground truth for a dataset.
evaluate: Evaluate an algorithm on a dataset.
generate_clustered_dataset: Generate a clustered dataset (more realistic).
generate_normalized_clustered_dataset: Generate a normalized dataset (for cosine/angular metrics).
generate_uniform_dataset: Generate a synthetic dataset with uniform random vectors.
mrr: Compute mean reciprocal rank for a single query.
recall_at_k: Compute recall@k for a single query.