Vectune: fast Vamana indexing
Vectune is a lightweight VectorDB with Incremental Indexing, based on FreshVamana. This project is implemented with the support of KinicDAO and powers the backend of KinicVectorDB for vector indexing.
Getting Start
By specifying progress-bar in features, you can check the progress of indexing.
[]
= { = "0.1.0", = ["progress-bar"]}
To perform calculations of Euclidean distances quickly using SIMD, it is necessary to specify nightly in example. If the rust-analyzer in VSCode gives an error for #![feature(portable_simd)], please set up your .vscode/settings.json.
Example
Setup and Run
To test with the SIFT1M dataset, please execute the following command. SIFT1M is a dataset of 1 million data points, each with 128 dimensions.
How it works
Indexing is performed on the data using a Builder, and searches and insertions are conducted on the graph.
use ;
let points = Vecnew;
for vec in base_vectors
let = default
.progress
.build;
let mut graph = new;
let k = 50;
let = search;
PointInterface Trait
You will need to define the dimensions and data type of the vectors used, as well as the method for calculating distance.
Please implement the following four methods:
distance(&self, other: &Self) -> f32fn dim() -> u32fn add(&self, other: &Self) -> Selffn div(&self, divisor: &usize) -> Self
distance() can be optimized using SIMD. Please refer to ./examples/src/bin/sift1m.rs.
The following example provides a simple implementation.
use PointInterface;
;
GraphInterface Trait
To accommodate the entire graph on storage solutions other than SSDs or other memory types, you need to implement the GraphInterface.
Please implement the following eleven methods:
fn alloc(&mut self, point: P) -> usizefn free(&mut self, id: &usize)fn cemetery(&self) -> Vec<usize>fn clear_cemetery(&mut self)fn backlink(&self, id: &usize) -> Vec<usize>fn get(&mut self, id: &usize) -> (P, Vec<usize>)fn size_l(&self) -> usizefn size_r(&self) -> usizefn size_a(&self) -> f32fn start_id(&self) -> usizefn overwirte_out_edges(&mut self, id: &usize, edges: Vec<usize>)
self.get() is defined with &mut self because it handles caching from SSDs and other storage devices.
In vectune::search(), nodes returned by self.cemetery() are marked as tombstones and are excluded from the search results. Additionally, they are permanently deleted in vectune::delete().
You need to manage backlinks when adding or deleting nodes. This is utilized in vectune::delete().
The following example provides a simple on-memory implementation.
use GraphInterface;
use Itertools;
Indexing
ais the threshold for RobustPrune; increasing it results in more long-distance edges and fewer nearby edges.rrepresents the number of edges; increasing it adds complexity to the graph but reduces the number of isolated nodes.lis the size of the retention list for greedy-search; increasing it allows for the construction of more accurate graphs, but the computational cost grows exponentially.seedis used for initializing random graphs; it allows for the fixation of the random graph, which can be useful for debugging.
let = default
.set_a
.set_r
.set_l
.set_seed
.progress
.build;
Searching
k represents the number of top-k results. It is necessary that k <= l.
search;
Inserting
insert;
Deleting
Completely remove the nodes returned by graph.cemetery() from the graph.
delete;