oasysdb 0.1.0

Fast embedded vector database with incremental HNSW indexing.
Documentation

OasysDB Use Case

License Contributor Covenant Discord

👋 Meet OasysDB

OasysDB is an embeddable, efficient, and easy to use vector database. It is designed to be used as a library and embedded inside your AI application. It is written in Rust and uses Sled as its persistence storage engine to save vector collections to the disk.

OasysDB implements HNSW (Hierachical Navigable Small World) as its indexing algorithm. It is a state-of-the-art algorithm that is used by many vector databases. It is fast, memory efficient, and it scales well to large datasets.

Why OasysDB?

OasysDB is very flexible for use cases related with vector search such as using RAG (Retrieval-Augmented Generation) method with an LLM to generate a context-aware output. OasysDB offers 2 major features that make it stand out from other vector databases or libraries:

  • Incremental vector operations: OasysDB allows you to add, remove, or modify vectors from the collections without having to rebuild their indexes. This allows for a more flexible and efficient approach on storing your vector data.
  • Flexible persistence options: You can choose to persist the vector collection to the disk or to keep it in memory. By default, whenever you use a collection, it will be loaded to the memory to ensure that the search performance is high.

🚀 Quickstart

This is a code snippet that you can use as a reference to get started with OasysDB. In short, use Collection to store your vector records or search similar vector and use Database to persist a vector collection to the disk.

use oasysdb::database::Database;
use oasysdb::collection::*;
use oasysdb::vector::*;
use rand::random; // Utility

fn main() {
    // Utility functions to generate random vector records.
    let records = gen_records::<128>(100);

    // Open the database and create a collection.
    let mut db = Database::open("data/readme").unwrap();
    let collection: Collection<usize, 128, 32> =
        db.create_collection("vectors", None, Some(&records)).unwrap();

    // Utility function to generate a random vector.
    let query = gen_vector::<128>();
    let result = collection.search(&query, 5).unwrap();

    println!("Nearest neighbor ID: {}", result[0].id);
}

fn gen_records<const N: usize>(len: usize) -> Vec<Record<usize, N>> {
    let mut records = Vec::with_capacity(len);

    for _ in 0..len {
        let vector = gen_vector::<N>();
        let data = random::<usize>();
        records.push(Record { vector, data });
    }

    records
}

fn gen_vector<const N: usize>() -> Vector<N> {
    let mut vec = [0.0; N];

    for float in vec.iter_mut() {
        *float = random::<f32>();
    }

    Vector(vec)
}

🏁 Benchmarks

OasysDB has a built-in benchmarking suite using Rust's Criterion crate that can be used to measure the performance of the vector database.

Currently, the benchmarks are focused on the performance of the collection's vector search functionality. We are working on adding more benchmarks to measure the performance of other operations.

If you are curious and want to run the benchmarks, you can use the following command which will download the benchmarking dataset and run the benchmarks:

cargo bench

🤝 Contributing

We welcome contributions from the community. Please see contributing.md for more information.

We are also looking for advisors to help guide the project direction and roadmap. If you are interested, please contact us via Discord or alternatively, you can email me at edwin@oasysai.com.

Disclaimer

This project is still in the early stages of development. We are actively working on it and we expect the API and functionality to change. We do not recommend using this in production yet.

Code of Conduct

We are committed to creating a welcoming community. Any participant in our project is expected to act respectfully and to follow the Code of Conduct.