Crate vectordb

Source
Expand description

§VectorDB (LanceDB) – Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.
  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
  • Support for vector similarity search, full-text search and SQL.
  • Native Rust, Python, Javascript/Typescript support.
  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
  • GPU support in building vector indices1.
  • Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

§Getting Started

LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml:

cargo install vectordb

§Quick Start

Rust API is not stable yet, please expect breaking changes.
§Connect to a database.
use vectordb::connect;
let db = connect("data/sample-lancedb").await.unwrap();

LanceDB accepts the different form of database path:

  • /path/to/database - local database on file system.
  • s3://bucket/path/to/database or gs://bucket/path/to/database - database on cloud object store
  • db://dbname - Lance Cloud

You can also use ConnectOptions to configure the connectoin to the database.

use vectordb::{connect_with_options, ConnectOptions};
let options = ConnectOptions::new("data/sample-lancedb")
    .index_cache_size(1024);
let db = connect_with_options(&options).await.unwrap();

LanceDB uses arrow-rs to define schema, data types and array itself. It treats FixedSizeList<Float16/Float32> columns as vector columns.

For more details, please refer to LanceDB documentation.

§Create a table

To create a Table, you need to provide a arrow_schema::Schema and a arrow_array::RecordBatch stream.

use arrow_schema::{DataType, Schema, Field};
use arrow_array::{RecordBatch, RecordBatchIterator};

let schema = Arc::new(Schema::new(vec![
  Field::new("id", DataType::Int32, false),
  Field::new("vector", DataType::FixedSizeList(
    Arc::new(Field::new("item", DataType::Float32, true)), 128), true),
]));
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(vec![
    RecordBatch::try_new(schema.clone(),
        vec![
            Arc::new(Int32Array::from_iter_values(0..1000)),
            Arc::new(FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
                (0..1000).map(|_| Some(vec![Some(1.0); 128])), 128)),
        ]).unwrap()
   ].into_iter().map(Ok),
    schema.clone());
db.create_table("my_table", Box::new(batches), None).await.unwrap();
§Create vector index (IVF_PQ)
tbl.create_index(&["vector"])
    .ivf_pq()
    .num_partitions(256)
    .build()
    .await
    .unwrap();
let results = table
    .search(&[1.0; 128])
    .execute_stream()
    .await
    .unwrap()
    .try_collect::<Vec<_>>()
    .await
    .unwrap();


  1. Only in Python SDK. 

Re-exports§

pub use connection::Connection;
pub use connection::Database;
pub use error::Error;
pub use error::Result;
pub use table::Table;
pub use table::TableRef;
pub use connection::connect;
pub use connection::connect_with_options;
pub use connection::ConnectOptions;

Modules§

connection
LanceDB Database
data
Data types, schema coercion, and data cleaning and etc.
error
index
io
ipc
IPC support
query
table
LanceDB Table APIs
utils

Enums§

WriteMode
The mode to write dataset.