Expand description
A lightweight research library for managing Approximate Nearest Neighbor search datasets.
It offers the following features:
- Storage of dense, sparse, and dense-sparse vector sets;
- Storage of query sets with ground-truth (i.e., exact nearest neighbors) according to different metrics;
- Basic functionality such as computing recall given a retrieved set; and,
- Serialization into and deserialization from HDF5 file format.
§Example usage
It is straightforward to read an ANN dataset. The code snippet below gives a concise example.
use ann_dataset::{AnnDataset, Hdf5File, InMemoryAnnDataset, Metric,
PointSet, QuerySet, GroundTruth};
// Load the dataset.
let dataset = InMemoryAnnDataset::<f32>::read("")
.expect("Failed to read the dataset.");
// Get a reference to the data points.
let data_points: &PointSet<_> = dataset.get_data_points();
// Get the test query set.
let test: &QuerySet<_> = dataset.get_test_query_set()
.expect("Failed to load test query set.");
let test_queries: &PointSet<_> = test.get_points();
let gt: &GroundTruth = test.get_ground_truth(&Metric::InnerProduct)
.expect("Failed to load ground truth for InnerProduct search.");
// Compute recall, where the argument is &[Vec<usize>],
// where the `i`-th entry is a list of ids of retrieved points
// for the `i`-th query.
let recall = gt.mean_recall(&[]);
Structs§
- Ground
Truth - Defines the exact nearest neighbors.
- InMemory
AnnDataset - An ANN dataset.
- Point
Set - A set of points (dense, sparse, or both) represented as a matrix, where each row corresponds to a single vector.
- Query
Set - A set of query points (dense, sparse, or both) and their exact nearest neighbors for various metrics.
Enums§
- Metric
- Collection of metrics and distance functions that characterize an ANN search.