hnsw 0.9.1

Fast approximate nearest neighbors
Documentation

hnsw

Discord Crates.io MIT/Apache docs.rs LoC

Hierarchical Navigable Small World Graph for fast ANN search

Enable the serde feature to serialize and deserialize HNSW.

Tips

A good default for M and M0 parameters is 12 and 24 respectively. According to the paper, M0 should always be double M, but you can change both of them freely.

Example

Binary feature search using hamming distance

use hnsw::{Hnsw, Searcher};
use rand_pcg::Pcg64;
use space::{MetricPoint, Neighbor};

struct Hamming(u8);

impl MetricPoint for Hamming {
    type Metric = u8;

    fn distance(&self, other: &Self) -> u8 {
        (self.0 ^ other.0).count_ones() as u8
    }
}

fn test_hnsw_discrete() -> (Hnsw<Hamming, Pcg64, 12, 24>, Searcher<u8>) {
    let mut searcher = Searcher::default();
    let mut hnsw = Hnsw::new();

    let features = [
        0b0001, 0b0010, 0b0100, 0b1000, 0b0011, 0b0110, 0b1100, 0b1001,
    ];

    for &feature in &features {
        hnsw.insert(Hamming(feature), &mut searcher);
    }

    (hnsw, searcher)
}

#[test]
fn insertion_discrete() {
    test_hnsw_discrete();
}

#[test]
fn nearest_neighbor_discrete() {
    let (hnsw, mut searcher) = test_hnsw_discrete();
    let mut neighbors = [Neighbor {
        index: !0,
        distance: !0,
    }; 8];

    hnsw.nearest(&Hamming(0b0001), 24, &mut searcher, &mut neighbors);
    // Distance 1
    neighbors[1..3].sort_unstable();
    // Distance 2
    neighbors[3..6].sort_unstable();
    // Distance 3
    neighbors[6..8].sort_unstable();
    assert_eq!(
        neighbors,
        [
            Neighbor {
                index: 0,
                distance: 0
            },
            Neighbor {
                index: 4,
                distance: 1
            },
            Neighbor {
                index: 7,
                distance: 1
            },
            Neighbor {
                index: 1,
                distance: 2
            },
            Neighbor {
                index: 2,
                distance: 2
            },
            Neighbor {
                index: 3,
                distance: 2
            },
            Neighbor {
                index: 5,
                distance: 3
            },
            Neighbor {
                index: 6,
                distance: 3
            }
        ]
    );
}

Please refer to the space documentation for the trait and types regarding distance. It also contains special Bits128 - Bits4096 tuple structs that wrap an array of bytes and enable SIMD capability. Benchmarks provided use these SIMD impls.

Floating-point search using euclidean distance

An implementation is currently not provided for euclidean distance after a recent refactor. Hamming distance was more relevant at the time, and so that was prioritized. To implement euclidean distance, do something roughly like the following:

use space::MetricPoint;
struct Euclidean<'a>(&'a [f32]);

impl MetricPoint for Euclidean<'_> {
    type Metric = u32;
    fn distance(&self, rhs: &Self) -> u64 {
        self.0
            .iter()
            .zip(rhs.0.iter())
            .map(|(&a, &b)| (a - b).powi(2))
            .sum::<f32>()
            .sqrt().to_bits()
    }
}

Note that the above implementation may have some numerical error on high dimensionality. In that case use a Kahan sum instead. It also may not utilize SIMD, but using an array may help with that.

Benchmarks

Here is a recall graph that you can compare to its alternatives:

Recall Graph

For more benchmarks and how to benchmark, see benchmarks.md.

Implementation

This is based on the paper "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs" by Yu. A. Malkov and D. A. Yashunin. This paper builds on the original paper for NSW. There are multiple papers written by the authors on NSW, which preceeded HNSW.

For more details about parameters and details of the implementation, see implementation.md.

Credit

This is in no way a direct copy or reimplementation of the original implementation. This was made purely based on the paper without reference to the original headers. The paper is very well written and easy to understand, with some minor exceptions. Thank you to the authors for your valuble contribution.

Questions? Contributions? Excited?

Please visit the Rust CV Discord.