Rust Implementation of Lance
The Open Lakehouse Format for Multimodal AI
Installation
Install using cargo:
cargo install lance
Examples
Create dataset
Suppose batches is an Arrow Vec<RecordBatch> and schema is Arrow SchemaRef:
use ;
let write_params = default;
let mut reader = new;
write.await.unwrap;
Read
let dataset = open.await.unwrap;
let mut scanner = dataset.scan;
let batches: = scanner
.try_into_stream
.await
.unwrap
.map
.
.await;
Take
let values: = dataset.take.await;
Vector index
Assume "embeddings" is a FixedSizeListArray
use VectorIndexParams;
let params = default;
params.num_partitions = 256;
params.num_sub_vectors = 16;
// this will Err if list_size(embeddings) / num_sub_vectors does not meet simd alignment
dataset.create_index.await;
What is Lance?
Lance is an open lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec that allows you to build a complete lakehouse on top of object storage to power your AI workflows.
The key features of Lance include:
-
Expressive hybrid search: Combine vector similarity search, full-text search (BM25), and SQL analytics on the same dataset with accelerated secondary indices.
-
Lightning-fast random access: 100x faster than Parquet or Iceberg for random access without sacrificing scan performance.
-
Native multimodal data support: Store images, videos, audio, text, and embeddings in a single unified format with efficient blob encoding and lazy loading.
-
Data evolution: Efficiently add columns with backfilled values without full table rewrites, perfect for ML feature engineering.
-
Zero-copy versioning: ACID transactions, time travel, and automatic versioning without needing extra infrastructure.
-
Rich ecosystem integrations: Apache Arrow, Pandas, Polars, DuckDB, Apache Spark, Ray, Trino, Apache Flink, and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino).
For more details, see the full Lance format specification.