Crate hannoy

Crate hannoy 

Source
Expand description

Hannoy is a key-value backed HNSW implementation based on arroy.

Many popular HNSW libraries are built in memory, meaning you need enough RAM to store all the vectors you’re indexing. Instead, hannoy uses LMDB — a memory-mapped KV store — as a storage backend.

This is more well-suited for machines running multiple programs, or cases where the dataset you’re indexing won’t fit in memory. LMDB also supports non-blocking concurrent reads by design, meaning its safe to query the index in multi-threaded environments.

§Examples

Open an LMDB database, store some vectors in it and query the nearest item from some query vector. This is the most trivial way to use hannoy and it’s fairly easy. Just do not forget to HannoyBuilder::build<M0,M> and heed::RwTxn::commit when you are done inserting your items.

use hannoy::{distances::Cosine, Database, Reader, Result, Writer};
use heed::EnvOpenOptions;
use rand::{rngs::StdRng, SeedableRng};

fn main() -> Result<()> {
    const DIM: usize = 3;
    let vecs: Vec<[f32; DIM]> = vec![[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]];

    let env = unsafe {
        EnvOpenOptions::new()
            .map_size(1024 * 1024 * 1024 * 1) // 1GiB
            .open("./")
    }
    .unwrap();

    let mut wtxn = env.write_txn().unwrap();
    let db: Database<Cosine> = env.create_database(&mut wtxn, None)?;
    let writer: Writer<Cosine> = Writer::new(db, 0, DIM);

    // insert into lmdb
    writer.add_item(&mut wtxn, 0, &vecs[0])?;
    writer.add_item(&mut wtxn, 1, &vecs[1])?;
    writer.add_item(&mut wtxn, 2, &vecs[2])?;

    // ...and build hnsw
    let mut rng = StdRng::seed_from_u64(42);

    let mut builder = writer.builder(&mut rng);
    builder.ef_construction(100).build::<16,32>(&mut wtxn)?;
    wtxn.commit()?;

    // search hnsw using a new lmdb read transaction
    let rtxn = env.read_txn()?;
    let reader = Reader::<Cosine>::open(&rtxn, 0, db)?;

    let query = vec![0.0, 1.0, 0.0];
    let nns = reader.nns(1).ef_search(10).by_vector(&rtxn, &query)?;

    dbg!("{:?}", &nns);
    Ok(())
}

Modules§

distances
The set of distances implementing the Distance and supported by hannoy.
internals
The set of types used by the Distance trait.

Structs§

HannoyBuilder
The options available when configuring the hannoy database.
QueryBuilder
Options used to make a query against an hannoy Reader.
Reader
A reader over the hannoy hnsw graph
RoaringBitmapCodec
A heed codec for roaring::RoaringBitmap.
Searched
Container storing nearest neighbour search result
Writer
A writer to store new items, remove existing ones, and build the search index to query the nearest neighbors to items or vectors.

Enums§

Error
The different set of errors that hannoy can encounter.

Traits§

Distance
A trait used by hannoy to compute the distances, compute the split planes, and normalize user vectors.

Type Aliases§

Database
The database required by hannoy for reading or writing operations.
ItemId
An identifier for the items stored in the database.
LayerId
An indentifier for the links of the hnsw. We can guarantee mathematically there will always be less than 256 layers.
Result
A custom Result type that is returning an hannoy error by default.