Crate arroy

source ·
Expand description

Arroy (Approximate Rearest Reighbors Oh Yeah) is a Rust library with the interface of the Annoy Python library to search for vectors in space that are close to a given query vector. It is based on LMDB, a memory-mapped key-value store, so many processes may share the same data and atomically modify the vectors.

§Examples

Open an LMDB database, store some vectors in it and query the top 20 nearest items from the first vector. This is the most trivial way to use arroy and it’s fairly easy. Just do not forget to Writer::build and heed::RwTxn::commit when you are done inserting your items.

use std::num::NonZeroUsize;

use arroy::distances::Euclidean;
use arroy::{Database as ArroyDatabase, Writer, Reader};
use rand::rngs::StdRng;
use rand::{Rng, SeedableRng};

/// That's the 200MiB size limit we allow LMDB to grow.
const TWENTY_HUNDRED_MIB: usize = 2 * 1024 * 1024 * 1024;

let dir = tempfile::tempdir()?;
let env = heed::EnvOpenOptions::new().map_size(TWENTY_HUNDRED_MIB).open(dir.path())?;

// we will open the default LMDB unnamed database
let mut wtxn = env.write_txn()?;
let db: ArroyDatabase<Euclidean> = env.create_database(&mut wtxn, None)?;

// Now we can give it to our arroy writer
let index = 0;
let dimensions = 5;
let writer = Writer::<Euclidean>::new(db, index, dimensions);

// let's write some vectors
writer.add_item(&mut wtxn, 0,    &[0.8,  0.49, 0.27, 0.76, 0.94])?;
writer.add_item(&mut wtxn, 1,    &[0.66, 0.86, 0.42, 0.4,  0.31])?;
writer.add_item(&mut wtxn, 2,    &[0.5,  0.95, 0.7,  0.51, 0.03])?;
writer.add_item(&mut wtxn, 100,  &[0.52, 0.33, 0.65, 0.23, 0.44])?;
writer.add_item(&mut wtxn, 1000, &[0.18, 0.43, 0.48, 0.81, 0.29])?;

// You can specify the number of trees to use or specify None.
let mut rng = StdRng::seed_from_u64(42);
writer.build(&mut wtxn, &mut rng, None)?;

// By committing, other readers can query the database in parallel.
wtxn.commit()?;

let mut rtxn = env.read_txn()?;
let reader = Reader::<Euclidean>::open(&rtxn, index, db)?;
let n_results = 20;

// You can increase the quality of the results by forcing arroy to search into more nodes.
// This multiplier is arbitrary but basically the higher, the better the results, the slower the query.
let is_precise = true;
let search_k = if is_precise {
    NonZeroUsize::new(n_results * reader.n_trees() * 15)
} else {
    None
};

// Similar searching can be achieved by requesting the nearest neighbors of a given item.
let item_id = 0;
let arroy_results = reader.nns_by_item(&rtxn, item_id, n_results, search_k, None)?.unwrap();

Modules§

Structs§

  • A reader over the arroy trees and user items.
  • The different stats of an arroy database.
  • The different stats of a tree in an arroy database.
  • A writer to store new items, remove existing ones, and build the search tree to query the nearest neighbors to items or vectors.

Enums§

  • The different set of errors that arroy can encounter.

Traits§

  • A trait used by arroy to compute the distances, compute the split planes, and normalize user vectors.

Type Aliases§

  • The database required by arroy for reading or writing operations.
  • An identifier for the items stored in the database.
  • A custom Result type that is returning an arroy error by default.