Expand description
Arroy (Approximate Rearest Reighbors Oh Yeah) is a Rust library with the interface of the Annoy Python library to search for vectors in space that are close to a given query vector. It is based on LMDB, a memory-mapped key-value store, so many processes may share the same data and atomically modify the vectors.
§Examples
Open an LMDB database, store some vectors in it and query the top 20 nearest items from the first vector. This is the most
trivial way to use arroy and it’s fairly easy. Just do not forget to ArroyBuilder::build
and heed::RwTxn::commit
when you are done inserting your items.
use std::num::NonZeroUsize;
use arroy::distances::Euclidean;
use arroy::{Database as ArroyDatabase, Writer, Reader};
use rand::rngs::StdRng;
use rand::{Rng, SeedableRng};
/// That's the 200MiB size limit we allow LMDB to grow.
const TWENTY_HUNDRED_MIB: usize = 2 * 1024 * 1024 * 1024;
let dir = tempfile::tempdir()?;
let env = unsafe { heed::EnvOpenOptions::new().map_size(TWENTY_HUNDRED_MIB).open(dir.path()) }?;
// we will open the default LMDB unnamed database
let mut wtxn = env.write_txn()?;
let db: ArroyDatabase<Euclidean> = env.create_database(&mut wtxn, None)?;
// Now we can give it to our arroy writer
let index = 0;
let dimensions = 5;
let writer = Writer::<Euclidean>::new(db, index, dimensions);
// let's write some vectors
writer.add_item(&mut wtxn, 0, &[0.8, 0.49, 0.27, 0.76, 0.94])?;
writer.add_item(&mut wtxn, 1, &[0.66, 0.86, 0.42, 0.4, 0.31])?;
writer.add_item(&mut wtxn, 2, &[0.5, 0.95, 0.7, 0.51, 0.03])?;
writer.add_item(&mut wtxn, 100, &[0.52, 0.33, 0.65, 0.23, 0.44])?;
writer.add_item(&mut wtxn, 1000, &[0.18, 0.43, 0.48, 0.81, 0.29])?;
// You can specify the number of trees to use or specify None.
let mut rng = StdRng::seed_from_u64(42);
writer.builder(&mut rng).build(&mut wtxn)?;
// By committing, other readers can query the database in parallel.
wtxn.commit()?;
let mut rtxn = env.read_txn()?;
let reader = Reader::<Euclidean>::open(&rtxn, index, db)?;
let n_results = 20;
let mut query = reader.nns(n_results);
// You can increase the quality of the results by forcing arroy to search into more nodes.
// This multiplier is arbitrary but basically the higher, the better the results, the slower the query.
let is_precise = true;
if is_precise {
query.search_k(NonZeroUsize::new(n_results * reader.n_trees() * 15).unwrap());
}
// Similar searching can be achieved by requesting the nearest neighbors of a given item.
let item_id = 0;
let arroy_results = query.by_item(&rtxn, item_id)?.unwrap();
Modules§
- distances
- The set of distances implementing the
Distance
and supported by arroy. - internals
- The set of types used by the
Distance
trait. - upgrade
- Everything related to the upgrade process.
Structs§
- Arroy
Builder - The options available when building the arroy database.
- Query
Builder - Options used to make a query against an arroy
Reader
. - Reader
- A reader over the arroy trees and user items.
- Stats
- The different stats of an arroy database.
- SubStep
- When a
MainStep
takes too long, it may output a sub-step that gives you more details about the progression we’ve made on the current step. - Tree
Stats - The different stats of a tree in an arroy database.
- Writer
- A writer to store new items, remove existing ones, and build the search tree to query the nearest neighbors to items or vectors.
- Writer
Progress - Helps you understand what is happening inside of arroy during an indexing process.
Enums§
- Error
- The different set of errors that arroy can encounter.
- Main
Step - Some steps arroy will go through during an indexing process. Some steps may be skipped in certain cases, and the name of the variant the order in which they appear and the time they take is unspecified, and might change from one version to the next one. It’s recommended not to assume anything from this enum.
Traits§
- Distance
- A trait used by arroy to compute the distances, compute the split planes, and normalize user vectors.