Expand description
Hannoy is a key-value backed HNSW implementation based on arroy.
Many popular HNSW libraries are built in memory, meaning you need enough RAM to store all the vectors you’re indexing. Instead, hannoy uses
LMDB — a memory-mapped KV store — as a storage backend.
This is more well-suited for machines running multiple programs, or cases where the dataset you’re indexing won’t fit in memory. LMDB also supports non-blocking concurrent reads by design, meaning its safe to query the index in multi-threaded environments.
§Examples
Open an LMDB database, store some vectors in it and query the nearest item from some query vector. This is the most
trivial way to use hannoy and it’s fairly easy. Just do not forget to HannoyBuilder::build<M0,M> and heed::RwTxn::commit
when you are done inserting your items.
use hannoy::{distances::Cosine, Database, Reader, Result, Writer};
use heed::EnvOpenOptions;
use rand::{rngs::StdRng, SeedableRng};
fn main() -> Result<()> {
const DIM: usize = 3;
let vecs: Vec<[f32; DIM]> = vec![[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]];
let env = unsafe {
EnvOpenOptions::new()
.map_size(1024 * 1024 * 1024 * 1) // 1GiB
.open("./")
}
.unwrap();
let mut wtxn = env.write_txn().unwrap();
let db: Database<Cosine> = env.create_database(&mut wtxn, None)?;
let writer: Writer<Cosine> = Writer::new(db, 0, DIM);
// insert into lmdb
writer.add_item(&mut wtxn, 0, &vecs[0])?;
writer.add_item(&mut wtxn, 1, &vecs[1])?;
writer.add_item(&mut wtxn, 2, &vecs[2])?;
// ...and build hnsw
let mut rng = StdRng::seed_from_u64(42);
let mut builder = writer.builder(&mut rng);
builder.ef_construction(100).build::<16,32>(&mut wtxn)?;
wtxn.commit()?;
// search hnsw using a new lmdb read transaction
let rtxn = env.read_txn()?;
let reader = Reader::<Cosine>::open(&rtxn, 0, db)?;
let query = vec![0.0, 1.0, 0.0];
let nns = reader.nns(1).ef_search(10).by_vector(&rtxn, &query)?;
dbg!("{:?}", &nns);
Ok(())
}Modules§
- distances
- The set of distances implementing the
Distanceand supported by hannoy. - internals
- The set of types used by the
Distancetrait.
Structs§
- Hannoy
Builder - The options available when configuring the hannoy database.
- Query
Builder - Options used to make a query against an hannoy
Reader. - Reader
- A reader over the hannoy hnsw graph
- Roaring
Bitmap Codec - A
heedcodec forroaring::RoaringBitmap. - Searched
- Container storing nearest neighbour search result
- Writer
- A writer to store new items, remove existing ones, and build the search index to query the nearest neighbors to items or vectors.
Enums§
- Error
- The different set of errors that hannoy can encounter.
Traits§
- Distance
- A trait used by hannoy to compute the distances, compute the split planes, and normalize user vectors.
Type Aliases§
- Database
- The database required by hannoy for reading or writing operations.
- ItemId
- An identifier for the items stored in the database.
- LayerId
- An indentifier for the links of the hnsw. We can guarantee mathematically there will always be less than 256 layers.
- Result
- A custom Result type that is returning an hannoy error by default.