Skip to main content

Crate lancedb

Crate lancedb 

Source
Expand description

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.

The key features of LanceDB include:

  • Production-scale vector search with no servers to manage.
  • Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
  • Support for vector similarity search, full-text search and SQL.
  • Native Rust, Python, Javascript/Typescript support.
  • Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
  • GPU support in building vector indices1.
  • Ecosystem integrations with LangChain ๐Ÿฆœ๏ธ๐Ÿ”—, LlamaIndex ๐Ÿฆ™, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

ยงGetting Started

LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml:

cargo add lancedb

ยงCrate Features

  • aws - Enable AWS S3 object store support.
  • dynamodb - Enable DynamoDB manifest store support.
  • azure - Enable Azure Blob Storage object store support.
  • gcs - Enable Google Cloud Storage object store support.
  • oss - Enable Alibaba Cloud OSS object store support.
  • remote - Enable remote client to connect to LanceDB cloud.
  • huggingface - Enable HuggingFace Hub integration for loading datasets from the Hub.
  • fp16kernels - Enable FP16 kernels for faster vector search on CPU.

ยงQuick Start

ยงConnect to a database.
let db = lancedb::connect("data/sample-lancedb").execute().await.unwrap();

LanceDB accepts the different form of database path:

  • /path/to/database - local database on file system.
  • s3://bucket/path/to/database or gs://bucket/path/to/database - database on cloud object store
  • db://dbname - Lance Cloud

You can also use [ConnectBuilder] to configure the connection to the database.

let db = lancedb::connect("data/sample-lancedb")
    .storage_options([
        ("aws_access_key_id", "some_key"),
        ("aws_secret_access_key", "some_secret"),
    ])
    .execute()
    .await
    .unwrap();

LanceDB uses arrow-rs to define schema, data types and array itself. It treats FixedSizeList<Float16/Float32> columns as vector columns.

For more details, please refer to the LanceDB documentation.

ยงCreate a table

To create a Table, you need to provide an arrow_array::RecordBatch. The schema of the RecordBatch determines the schema of the table.

Vector columns should be represented as FixedSizeList<Float16/Float32> data type.

use arrow_array::{RecordBatch, RecordBatchIterator};
use arrow_schema::{DataType, Field, Schema};

let ndims = 128;
let schema = Arc::new(Schema::new(vec![
    Field::new("id", DataType::Int32, false),
    Field::new(
        "vector",
        DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), ndims),
        true,
    ),
]));
let data = RecordBatch::try_new(
        schema.clone(),
        vec![
            Arc::new(Int32Array::from_iter_values(0..256)),
            Arc::new(
                FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
                    (0..256).map(|_| Some(vec![Some(1.0); ndims as usize])),
                    ndims,
                ),
            ),
        ],
    )
    .unwrap();
db.create_table("my_table", data)
    .execute()
    .await
    .unwrap();
ยงCreate vector index (IVF_PQ)

LanceDB is capable to automatically create appropriate indices based on the data types of the columns. For example,

  • If a column has a data type of FixedSizeList<Float16/Float32>, LanceDB will create a IVF-PQ vector index with default parameters.
  • Otherwise, it creates a BTree index by default.
use lancedb::index::Index;
tbl.create_index(&["vector"], Index::Auto)
   .execute()
   .await
   .unwrap();

User can also specify the index type explicitly, see Table::create_index.

let results = table
    .query()
    .nearest_to(&[1.0; 128])?
    .execute()
    .await?
    .try_collect::<Vec<_>>()
    .await?;

  1. Only in Python SDK. โ†ฉ

Re-exportsยง

pub use connection::ConnectNamespaceBuilder;
pub use connection::Connection;
pub use error::Error;
pub use error::Result;
pub use table::Table;
pub use connection::connect;
pub use connection::connect_namespace;

Modulesยง

arrow
connection
Functions to establish a connection to a LanceDB database
data
Data types, schema coercion, and data cleaning and etc.
database
The database module defines the Database trait and related types.
dataloader
embeddings
error
expr
Expression builder API for type-safe query construction
index
io
ipc
IPC support
query
remote
This module contains a remote client for a LanceDB server. This is used to communicate with LanceDB cloud. It can also serve as an example for building client/server applications with LanceDB or as a client for some other custom LanceDB service.
rerankers
table
LanceDB Table APIs
utils

Structsยง

ObjectStoreRegistry
A registry of object store providers.
Session
Re-export Lance Session and ObjectStoreRegistry for custom session creation A user session holds the runtime state for a crate::Dataset

Enumsยง

DistanceType