Shareable Infrastructure for Benchmarking Vector Indexing

The purpose of this crate is to create abstractions and implementations for benchmarking DiskANN vector indexing operations. We try to facilitate infrastructure that can be shared across a range of [diskann::provider::DataProvider]s with stable APIs to enable

A tight benchmarking loop for developers performing performance optimization.
Creating standalone binaries for CI benchmarking jobs.
Shared infrastructure to facilitate developing new providers.

Algorithms

[build]: Tools for running parallelized index builds.
- [build::graph]: Built-in utilities for working with [diskann::graph::DiskANNIndex].
[search]: Tools for running parallelized search operations.
- [search::graph]: Built-in utilities for working with [diskann::graph::DiskANNIndex].
[streaming]: Tools for running streaming workloads consisting of inserts, deletes, replaces, searches, etc.
- [streaming::runbooks]: Built-in [streaming::Executor]s for dynamic operations.
  - [streaming::runbooks::bigann]: BigANN style runbook support.
- [streaming::graph]: Built-in utilities for working with [diskann::graph::DiskANNIndex].

Tools

[recall]: KNN-Recall and other accuracy measures.
[tokio]: Quickly create new [tokio::runtime::Runtime]s.

Error Handling

Index benchmark operations typically live high in a program's call stack and need to support a wide variety of index implementations and thus error types. To that end, [anyhow::Error] is typically used at API boundaries. While this does hide the ways in which method can fail, the [anyhow::Error] type balances generality and fidelity.

diskann-benchmark-core 0.49.1

Shareable Infrastructure for Benchmarking Vector Indexing

Algorithms

Tools

Error Handling