Expand description
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
The key features of LanceDB include:
- Production-scale vector search with no servers to manage.
- Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
- Support for vector similarity search, full-text search and SQL.
- Native Rust, Python, Javascript/Typescript support.
- Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
- GPU support in building vector indices1.
- Ecosystem integrations with LangChain ๐ฆ๏ธ๐, LlamaIndex ๐ฆ, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
ยงGetting Started
LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml:
cargo add lancedbยงCrate Features
aws- Enable AWS S3 object store support.dynamodb- Enable DynamoDB manifest store support.azure- Enable Azure Blob Storage object store support.gcs- Enable Google Cloud Storage object store support.oss- Enable Alibaba Cloud OSS object store support.remote- Enable remote client to connect to LanceDB cloud.huggingface- Enable HuggingFace Hub integration for loading datasets from the Hub.fp16kernels- Enable FP16 kernels for faster vector search on CPU.
ยงQuick Start
ยงConnect to a database.
let db = lancedb::connect("data/sample-lancedb").execute().await.unwrap();LanceDB accepts the different form of database path:
/path/to/database- local database on file system.s3://bucket/path/to/databaseorgs://bucket/path/to/database- database on cloud object storedb://dbname- Lance Cloud
You can also use [ConnectBuilder] to configure the connection to the database.
let db = lancedb::connect("data/sample-lancedb")
.storage_options([
("aws_access_key_id", "some_key"),
("aws_secret_access_key", "some_secret"),
])
.execute()
.await
.unwrap();LanceDB uses arrow-rs to define schema, data types and array itself.
It treats FixedSizeList<Float16/Float32>
columns as vector columns.
For more details, please refer to the LanceDB documentation.
ยงCreate a table
To create a Table, you need to provide an arrow_array::RecordBatch. The
schema of the RecordBatch determines the schema of the table.
Vector columns should be represented as FixedSizeList<Float16/Float32> data type.
use arrow_array::{RecordBatch, RecordBatchIterator};
use arrow_schema::{DataType, Field, Schema};
let ndims = 128;
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new(
"vector",
DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), ndims),
true,
),
]));
let data = RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..256)),
Arc::new(
FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
(0..256).map(|_| Some(vec![Some(1.0); ndims as usize])),
ndims,
),
),
],
)
.unwrap();
db.create_table("my_table", data)
.execute()
.await
.unwrap();ยงCreate vector index (IVF_PQ)
LanceDB is capable to automatically create appropriate indices based on the data types of the columns. For example,
- If a column has a data type of
FixedSizeList<Float16/Float32>, LanceDB will create aIVF-PQvector index with default parameters. - Otherwise, it creates a
BTreeindex by default.
use lancedb::index::Index;
tbl.create_index(&["vector"], Index::Auto)
.execute()
.await
.unwrap();User can also specify the index type explicitly, see Table::create_index.
ยงOpen table and search
let results = table
.query()
.nearest_to(&[1.0; 128])?
.execute()
.await?
.try_collect::<Vec<_>>()
.await?;Only in Python SDK. โฉ
Re-exportsยง
pub use connection::ConnectNamespaceBuilder;pub use connection::Connection;pub use error::Error;pub use error::Result;pub use table::Table;pub use connection::connect;pub use connection::connect_namespace;
Modulesยง
- arrow
- connection
- Functions to establish a connection to a LanceDB database
- data
- Data types, schema coercion, and data cleaning and etc.
- database
- The database module defines the
Databasetrait and related types. - dataloader
- embeddings
- error
- expr
- Expression builder API for type-safe query construction
- index
- io
- ipc
- IPC support
- query
- remote
- This module contains a remote client for a LanceDB server. This is used to communicate with LanceDB cloud. It can also serve as an example for building client/server applications with LanceDB or as a client for some other custom LanceDB service.
- rerankers
- table
- LanceDB Table APIs
- utils
Structsยง
- Object
Store Registry - A registry of object store providers.
- Session
- Re-export Lance Session and ObjectStoreRegistry for custom session creation
A user session holds the runtime state for a
crate::Dataset